Google Cloud Text-to-Speech Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages. Setup API key via GOOGLE_TTS_API_KEY env var or skills/google-tts/config.json with {"api_key": "..."} . Requires ffmpeg for multi-chunk documents. Optional: pip install PyPDF2 python-docx for PDF/DOCX. Commands List Voices python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2 python skills/google-tts/scripts/google_tts.py voices --json Text-to-Speech

From text or document (PDF, DOCX, MD, TXT)

python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3 python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

With voice, rate, pitch, encoding options

python skills/google-tts/scripts/google_tts.py tts

--file

doc.md

--voice

en-US-Neural2-F

--rate

0.9

--encoding

MP3

--output

~/Downloads/out.mp3

Podcast Generation

Takes a JSON script with alternating speakers, synthesizes each with a different voice.

[

{

"speaker"

:

"host1"

,

"text"

:

"Welcome to our podcast!"

}

,

{

"speaker"

:

"host2"

,

"text"

:

"Thanks for having me..."

}

]

python skills/google-tts/scripts/google_tts.py podcast

--script

/tmp/script.json

--output

~/Downloads/podcast.mp3

python skills/google-tts/scripts/google_tts.py podcast

--script

/tmp/script.json

--voice1

en-US-Neural2-J

--voice2

en-US-Neural2-H

--rate

0.9

--output

~/Downloads/podcast.mp3

Workflow

Single-Voice Narration

If user provides a file path, use

--file

. For generated content, write clean prose to

/tmp/tts_input.md

first.

Default voice:

en-US-Neural2-D

(male) or

en-US-Neural2-F

(female). Use Neural2 for best quality/cost balance.

Generate:

python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3

Report file location and size. Default output to

~/Downloads/

.

Podcast from Document

Extract text:

python skills/google-tts/scripts/extract.py /path/to/document.pdf

Generate a two-host conversation script as JSON:

Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.

Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.

Write script to

/tmp/podcast_script.json

Generate:

python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3

Clean up temp files.

Reference

Recommended voice type

Neural2 (~$4/1M chars, high quality)

Speaking rate

0.25-4.0 (0.85-0.95 good for technical content)

Pitch

-20.0 to 20.0 semitones
Encodings: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg) API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.

安装

From text or document (PDF, DOCX, MD, TXT)

With voice, rate, pitch, encoding options