AI Avatar & Talking Head Videos Create AI avatars and talking head videos via inference.sh CLI. Quick Start Requires inference.sh CLI ( infsh ). Get installation instructions: npx skills add inference-sh/skills@agent-tools infsh login
Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }' Available Models Model App ID Best For OmniHuman 1.5 bytedance/omnihuman-1-5 Multi-character, best quality OmniHuman 1.0 bytedance/omnihuman-1-0 Single character Fabric 1.0 falai/fabric-1-0 Image talks with lipsync PixVerse Lipsync falai/pixverse-lipsync Highly realistic Search Avatar Apps infsh app list --search "omnihuman" infsh app list --search "lipsync" infsh app list --search "fabric" Examples OmniHuman 1.5 (Multi-Character) infsh app run bytedance/omnihuman-1-5 --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }' Supports specifying which character to drive in multi-person images. Fabric 1.0 (Image Talks) infsh app run falai/fabric-1-0 --input '{ "image_url": "https://face.jpg", "audio_url": "https://audio.mp3" }' PixVerse Lipsync infsh app run falai/pixverse-lipsync --input '{ "image_url": "https://portrait.jpg", "audio_url": "https://speech.mp3" }' Generates highly realistic lipsync from any audio. Full Workflow: TTS + Avatar
1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{ "prompt": "Welcome to our product demo. Today I will show you..." }'
speech.json
2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5
--input
'{
"image_url": "https://presenter-photo.jpg",
"audio_url": "
1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}'
transcript.json
2. Translate text (manually or with an LLM)
3. Generate speech in new language
infsh app run infsh/kokoro-tts
--input
'{"text": "
new_speech.json
4. Lipsync the original video with new audio
- infsh app run infsh/latentsync-1-6
- --input
- '{
- "video_url": "https://original-video.mp4",
- "audio_url": "
" - }'
- Use Cases
- Marketing
-
- Product demos with AI presenter
- Education
-
- Course videos, explainers
- Localization
-
- Dub content in multiple languages
- Social Media
-
- Consistent virtual influencer
- Corporate
- Training videos, announcements Tips Use high-quality portrait photos (front-facing, good lighting) Audio should be clear with minimal background noise OmniHuman 1.5 supports multiple people in one image LatentSync is best for syncing existing videos to new audio