Caption Format Conversion

Convert between 30+ caption/caption formats using lattifai-captions.

⚡ YouTube Workflow

1. Transcribe YouTube video directly

omnicaptions transcribe "https://youtube.com/watch?v=VIDEO_ID" -o transcript.md

2. Convert to any format

omnicaptions convert transcript.md -o output.srt omnicaptions convert transcript.md -o output.ass omnicaptions convert transcript.md -o output.vtt

When to Use Converting SRT to VTT, ASS, TTML, etc. Converting Gemini markdown transcript to standard caption formats Converting YouTube VTT (with word-level timestamps) to other formats Batch format conversion When NOT to Use Need transcription (use /omnicaptions:transcribe) Need translation (use /omnicaptions:translate) Setup pip install https://github.com/lattifai/omni-captions-skills/raw/main/packages/lattifai_captions-0.1.0.tar.gz pip install https://github.com/lattifai/omni-captions-skills/raw/main/packages/omnicaptions-0.1.0.tar.gz

Quick Reference Format Extension Read Write SRT .srt ✓ ✓ VTT .vtt ✓ ✓ ASS/SSA .ass ✓ ✓ TTML .ttml ✓ ✓ Gemini MD .md ✓ ✓ JSON .json ✓ ✓ TXT .txt ✓ ✓

Full list: SRT, VTT, ASS, SSA, TTML, DFXP, SBV, SUB, LRC, JSON, TXT, TSV, Audacity, Audition, FCPXML, EDL, and more.

CLI Usage

Convert (auto-output to same directory, only changes extension)

omnicaptions convert input.srt -t vtt # → ./input.vtt omnicaptions convert transcript.md # → ./transcript.srt

Specify output file or directory

omnicaptions convert input.srt -o output/ # → output/input.srt omnicaptions convert input.srt -o output.vtt # → output.vtt

Specify format explicitly

omnicaptions convert input.txt -o out.srt -f txt -t srt

ASS Style Presets

When converting to ASS format, use --style to apply preset styles:

omnicaptions convert input.srt -o output.ass --style default # White text, bottom omnicaptions convert input.srt -o output.ass --style top # White text, top omnicaptions convert input.srt -o output.ass --style bilingual # White + Yellow (for bilingual) omnicaptions convert input.srt -o output.ass --style yellow # Yellow text, bottom

Preset Position Line 1 Line 2 Use Case default Bottom White White Standard captions top Top White White When bottom is occupied bilingual Bottom White Yellow Bilingual captions (原文 + 译文) yellow Bottom Yellow Yellow High visibility Bilingual Example

If your SRT has two-line captions like:

1 00:00:01,000 --> 00:00:03,000 Hello World 你好世界

Use --style bilingual or custom colors:

Preset: white + yellow

omnicaptions convert bilingual.srt -o output.ass --style bilingual

Custom colors: green English + yellow Chinese

omnicaptions convert bilingual.srt -o output.ass --line1-color "#00FF00" --line2-color "#FFFF00"

Mix preset with custom line2 color

omnicaptions convert bilingual.srt -o output.ass --style default --line2-color "#FF6600"

Custom Color Options Option Description --line1-color "#RRGGBB" First line (original) color --line2-color "#RRGGBB" Second line (translation) color

Common colors: #FFFFFF (white), #FFFF00 (yellow), #00FF00 (green), #00FFFF (cyan), #FF6600 (orange)

Font Size and Resolution

Font size is auto-calculated based on video resolution. Resolution is detected from (priority order):

--resolution argument (e.g., 1080p, 4k, 1920x1080) --video argument (uses ffprobe to detect) .meta.json file (saved by omnicaptions download) Default: 1080p

Auto-detect from .meta.json (saved by download command)

omnicaptions convert abc123.en.srt -o abc123.en.ass --karaoke

Specify resolution directly

omnicaptions convert input.srt -o output.ass --resolution 4k omnicaptions convert input.srt -o output.ass --resolution 720p omnicaptions convert input.srt -o output.ass --resolution 1920x1080

Detect from video file (uses ffprobe)

omnicaptions convert input.srt -o output.ass --video video.mp4

Override auto-calculated fontsize

omnicaptions convert input.srt -o output.ass --resolution 4k --fontsize 80

Resolution PlayRes Auto FontSize 480p 854×480 24 720p 1280×720 32 1080p 1920×1080 48 (default) 2K 2560×1440 64 4K 3840×2160 96 Karaoke Mode

Generate karaoke subtitles with word-level highlighting. Requires word-level timing (use LaiCut alignment first).

Basic karaoke (sweep effect - gradual fill)

omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke

Different effects

omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke sweep # Gradual fill (default) omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke instant # Instant highlight omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.ass --karaoke outline # Outline then fill

LRC karaoke (enhanced word timestamps)

omnicaptions convert lyrics_LaiCut.json -o lyrics_LaiCut_karaoke.lrc --karaoke

Effect ASS Tag Description sweep \kf Gradual fill from left to right (default) instant \k Instant word highlight outline \ko Outline fills, then text fills Karaoke Workflow

1. Align with LaiCut (get word-level timing in JSON)

omnicaptions LaiCut audio.mp3 lyrics.txt

2. Convert to karaoke ASS

omnicaptions convert lyrics_LaiCut.json -o karaoke.ass --karaoke

Or combine with style

omnicaptions convert lyrics_LaiCut.json -o karaoke.ass --karaoke --style yellow

Python Usage from omnicaptions import Caption

Load any format

cap = Caption.read("input.srt")

Write to any format

cap.write("output.vtt") cap.write("output.ass") cap.write("output.ttml")

Common Mistakes Mistake Fix Format not detected Use --from / --to flags Missing timestamps Source format must have timing info Encoding error Specify encoding="utf-8" Related Skills Skill Use When /omnicaptions:transcribe Need transcript from audio/video /omnicaptions:translate Translate with Gemini API /omnicaptions:translate Translate with Claude (no API key) /omnicaptions:download Download video/captions first Workflow Examples

Transcribe → Convert → Translate (with Claude)

/omnicaptions:transcribe video.mp4 /omnicaptions:convert video_GeminiUnd.md -o video.srt /omnicaptions:translate video.srt -l zh --bilingual

omnicaptions-convert

安装