gemini-tts

安装量: 48
排名: #15532

安装

npx skills add https://github.com/akrindev/google-studio-skills --skill gemini-tts
Gemini Text-to-Speech
Generate natural-sounding speech from text using Gemini's TTS models through executable scripts with support for multiple voices and multi-speaker conversations.
When to Use This Skill
Use this skill when you need to:
Convert text to natural speech
Create audio for podcasts, audiobooks, or videos
Generate multi-speaker conversations
Stream audio for long content
Choose from multiple voice options
Create accessible audio content
Generate voiceovers for presentations
Batch convert text to audio files
Available Scripts
scripts/tts.py
Purpose
Convert text to speech using Gemini TTS models
When to use
:
Any text-to-speech conversion
Multi-speaker conversation generation
Streaming audio for long texts
Voiceovers for content creation
Accessible audio generation
Key parameters
:
Parameter
Description
Example
text
Text to convert (required)
"Hello, world!"
--voice
,
-v
Voice name
Kore
--output
,
-o
Base name for output file
welcome
--output-dir
Output directory for audio
audio/
--no-timestamp
Disable auto timestamp
Flag
--model
,
-m
TTS model
gemini-2.5-flash-preview-tts
--stream
,
-s
Enable streaming
Flag
--speakers
Multi-speaker mapping
"Joe:Kore,Jane:Puck"
Output
WAV audio file path Workflows Workflow 1: Basic Text-to-Speech python scripts/tts.py "Hello, world! Have a wonderful day." Best for: Quick audio generation, simple messages Voice: Kore (default, clear and professional) Output: audio/tts_output_YYYYMMDD_HHMMSS.wav (auto timestamp) Workflow 2: Choose Different Voice python scripts/tts.py "Welcome to our podcast about technology trends" --voice Puck --output welcome Best for: Friendly, conversational content Voice options: Kore, Puck, Charon, Fenrir, Aoede, Zephyr, Sulafat Output: audio/welcome_YYYYMMDD_HHMMSS.wav Workflow 3: Multi-Speaker Conversation python scripts/tts.py "TTS the following conversation: Joe: How's it going today? Jane: Not too bad, how about you? Joe: I'm working on a new project. Jane: Sounds exciting, tell me more!" --speakers "Joe:Kore,Jane:Puck" --output conversation Best for: Dialogues, interviews, role-playing content Format: Marked conversation with speaker names Script automatically routes text to appropriate voices Output: audio/conversation_YYYYMMDD_HHMMSS.wav Workflow 4: Long Content with Streaming python scripts/tts.py "This is a very long text that would benefit from streaming..." --stream --output long-form Best for: Podcasts, audiobooks, long articles Streaming: Processes audio in chunks for long texts Output: audio/long-form_YYYYMMDD_HHMMSS.wav Workflow 5: Professional Voiceover python scripts/tts.py "Welcome to our quarterly earnings presentation. Today we'll discuss our growth metrics and future plans." --voice Charon --output voiceover Best for: Corporate content, presentations, formal announcements Voice: Charon (deep, authoritative) Use when: Professional, serious tone required Workflow 6: Custom Output Directory python scripts/tts.py "Save to specific folder." --output-dir ./my-projects/podcasts/ --output episode1 Best for: Organized project structures Directory created automatically if it doesn't exist Output: ./my-projects/podcasts/episode1_YYYYMMDD_HHMMSS.wav Workflow 7: Content Creation Pipeline (Text → Audio)

1. Generate script (gemini-text skill)

python skills/gemini-text/scripts/generate.py "Write a 2-minute podcast intro about sustainable energy"

2. Generate audio (this skill)

python scripts/tts.py "[Paste generated script]" --voice Fenrir --output podcast-intro

3. Use in video or podcast

Best for: Podcasts, audiobooks, video narration
Combines with: gemini-text for script generation
Workflow 8: Accessible Content
python scripts/tts.py
"Welcome to our accessible website. This audio describes our main navigation options."
--voice
Aoede
--output
accessibility
Best for: Web accessibility, screen reader alternatives
Voice:
Aoede
(melodic, pleasant)
Use when: Making content accessible to visually impaired users
Workflow 9: Educational Content
python scripts/tts.py
"Chapter 1: Introduction to Quantum Computing. Let's explore the fundamental principles..."
--voice
Zephyr
--output
chapter1
Best for: Educational materials, tutorials, e-learning
Voice:
Zephyr
(light, airy)
Combines well with: gemini-text for content generation
Workflow 10: Disable Timestamp
python scripts/tts.py
"Fixed filename."
--output
my-audio --no-timestamp
Best for: When you want complete control over filename
Output:
audio/my-audio.wav
(no timestamp)
Use when: Generating files for specific naming schemes
Parameters Reference
Model Selection
Model
Quality
Speed
Best For
gemini-2.5-flash-preview-tts
Good
Fast
General use, high volume
gemini-2.5-pro-preview-tts
Higher
Slower
Premium content, voiceovers
Voice Selection
Voice
Characteristics
Best For
Kore
Clear, professional
Announcements, general purpose (default)
Puck
Friendly, conversational
Casual content, interviews
Charon
Deep, authoritative
Corporate, serious content
Fenrir
Warm, expressive
Storytelling, narratives
Aoede
Melodic, pleasant
Educational, accessibility
Zephyr
Light, airy
Gentle content, tutorials
Sulafat
Neutral, balanced
Documentaries, factual content
Audio Format
Specification
Value
Format
WAV (PCM)
Sample rate
24000 Hz
Channels
1 (mono)
Bit depth
16-bit
Token Limits
Limit
Type
Description
8,192
Input
Maximum input text tokens
16,384
Output
Maximum output audio tokens
Output Interpretation
Audio File
Format: WAV (compatible with most players)
Mono channel (single audio track)
Sample rate: 24000 Hz (broadcast quality)
Can be converted to MP3/AAC if needed
Multi-Speaker Files
Single WAV file with multiple voices
Voices separated by timing within file
Use
--speakers
parameter to map speakers to voices
Streaming Output
Audio processed in chunks during generation
Script shows "Streaming audio..." message
Useful for very long texts or real-time applications
Common Issues
"google-genai not installed"
pip
install
google-genai
"Voice name not found"
Check voice name spelling
Use available voices: Kore, Puck, Charon, Fenrir, Aoede, Zephyr, Sulafat
Voice names are case-sensitive
"No audio generated"
Check text is not empty
Verify text doesn't exceed token limit (8,192)
Try shorter text segments
Check API quota limits
"Multi-speaker format error"
Format:
SpeakerName:VoiceName,Speaker2:Voice2
Separate speakers with commas
Use colon between speaker and voice
Example:
"Joe:Kore,Jane:Puck,Host:Charon"
"Output file already exists"
Script will overwrite existing files
Change
--output
filename to avoid conflicts
Use unique names for batch generation
Audio quality issues
Check input text for unusual characters
Try different voice for better pronunciation
Consider splitting long text into smaller segments
Verify audio playback software compatibility
Best Practices
Voice Selection
Kore
General purpose, clear articulation
Puck
Conversational, engaging tone
Charon
Professional, authoritative
Fenrir
Emotional, storytelling
Aoede
Soft, gentle for accessibility
Zephyr
Educational, clear explanations Text Preparation Use natural language and punctuation Include pauses with commas and periods Spell out difficult words if needed Break very long text into logical segments Add speaker labels for multi-speaker content Performance Optimization Use streaming for very long texts Generate shorter segments for better control Use flash model for faster generation Batch process multiple files for efficiency Quality Tips Test different voices for your content type Use appropriate pacing with punctuation Consider context when selecting voice Listen to output before final use Multi-speaker requires clear speaker labeling Use Cases by Voice Voice Ideal Use Cases Kore Announcements, navigation, general info Puck Podcasts, interviews, casual content Charon Corporate, news, formal presentations Fenrir Audiobooks, stories, emotional content Aoede Accessibility, educational, gentle content Zephyr Tutorials, explanations, guides Sulafat Documentaries, factual presentations
返回排行榜