Gemini Transcription

Transcribe audio/video using Google Gemini API with structured markdown output.

YouTube Video Workflow

Important: Check for existing captions before transcribing:

Check captions: yt-dlp --list-subs "URL"
Has caption → Use /omnicaptions:download to get existing captions (better quality)
No caption → Transcribe directly with URL (don't download first!)

Confirm with user: Before transcribing, ask if they want to check for existing captions first.

URL & Local File Support

Gemini natively supports YouTube URLs - no need to download, just pass the URL directly:

YouTube URL (recommended, no download needed)

omnicaptions transcribe "https://www.youtube.com/watch?v=VIDEO_ID"

Local files

omnicaptions transcribe video.mp4

Note: Output defaults to current directory unless user specifies -o.

When to Use Video URLs - YouTube, direct video links (Gemini native support) Transcribing podcasts, interviews, lectures Need verbatim transcript with timestamps and speaker labels Want auto-generated chapters from content Mixed-language audio (code-switching preserved) When NOT to Use Video has existing captions - Use /omnicaptions:download to get existing captions first Need real-time streaming transcription (use Whisper) Audio >2 hours (Gemini upload limit) Want translation instead of transcription Quick Reference Method Description transcribe(path) Transcribe file or URL (sync) translate(in, out, lang) Translate captions write(text, path) Save text to file Setup pip install https://github.com/lattifai/omni-captions-skills/raw/main/packages/lattifai_captions-0.1.0.tar.gz pip install https://github.com/lattifai/omni-captions-skills/raw/main/packages/omnicaptions-0.1.0.tar.gz

API Key

Priority: GEMINI_API_KEY env → .env file → ~/.config/omnicaptions/config.json

If not set, ask user: Please enter your Gemini API key (get from https://aistudio.google.com/apikey):

Then run with -k . Key will be saved to config file automatically.

CLI Usage

IMPORTANT: CLI requires subcommand (transcribe, translate, convert)

Transcribe (auto-output to same directory)

omnicaptions transcribe video.mp4 # → ./video_GeminiUnd.md omnicaptions transcribe "https://youtu.be/abc" # → ./abc_GeminiUnd.md

Specify output file or directory

omnicaptions transcribe video.mp4 -o output/ # → output/video_GeminiUnd.md omnicaptions transcribe video.mp4 -o my.md # → my.md

Options

omnicaptions transcribe -m gemini-3-pro-preview video.mp4 omnicaptions transcribe -l zh video.mp4 # Force Chinese

Option Description -k, --api-key Gemini API key (auto-prompted if missing) -o, --output Output file or directory (default: auto) -m, --model Model (default: gemini-3-flash-preview) -l, --language Force language (zh, en, ja) -t, --translate LANG Translate to language (one-step) --bilingual Bilingual output (with -t) -v, --verbose Verbose output Bilingual Captions (Optional)

If user requests bilingual output, add -t --bilingual:

omnicaptions transcribe video.mp4 -t zh --bilingual

For precise timing, use separate workflow: transcribe → LaiCut → translate (see Related Skills).

Output Format

[00:00:00] Introduction
[00:02:15] Main Topic

[00:00:00] Introduction

Host: Welcome to the show. [00:00:01]

Guest: Thanks for having me. [00:00:05]

[Applause] [00:00:08]

Key features:

[HH:MM:SS] Title chapter headers

Speaker: labels (auto-detected) [HH:MM:SS] timestamp at paragraph end [Event] for non-speech (laughter, music) Common Mistakes Mistake Fix No API key error Use -k YOUR_KEY or follow the prompt Empty response Check file format (mp3/mp4/wav/m4a supported) Upload timeout File too large (>2GB); split first Wrong language Use -l en to force language Related Skills Skill Use When /omnicaptions:convert Convert output to SRT/VTT/ASS /omnicaptions:translate Translate (Gemini API or Claude native) /omnicaptions:download Download video/audio first Workflow Examples

Basic transcription

omnicaptions transcribe video.mp4

→ video_GeminiUnd.md

Precise timing needed: transcribe → LaiCut align → convert

omnicaptions transcribe video.mp4 omnicaptions LaiCut video.mp4 video_GeminiUnd.md

→ video_GeminiUnd_LaiCut.json

omnicaptions convert video_GeminiUnd_LaiCut.json -o video_GeminiUnd_LaiCut.srt

Note: For translation, use /omnicaptions:translate (default: Claude, optional: Gemini API)

omnicaptions-transcribe

安装