image-to-video

安装量: 11.2K
排名: #705

安装

npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill image-to-video

Image-to-Video — Pro Pack on RunComfy runcomfy.com · HappyHorse I2V · Wan 2.7 · Seedance 2.0 Pro · GitHub Image-to-video, intent-routed. This skill doesn't lock you to one model — it picks the right i2v model in the RunComfy catalog based on what the user actually wants: portrait animation, custom-voiceover lip-sync, or multi-modal composition. npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g Pick the right model for the user's intent User intent Model Why Animate a portrait — keep identity stable HappyHorse 1.0 I2V

1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity

Product reveal / 360 / macro motion
HappyHorse 1.0 I2V
Geometry preservation + smooth camera moves
Native synchronized ambient audio in one pass
HappyHorse 1.0 I2V
In-pass audio synthesis
Animate
and
lip-sync to a
custom voiceover track
Wan 2.7 +
audio_url
Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it
Multi-language dub variants (same image, different audio per call)
Wan 2.7 +
audio_url
Same shot, swap
audio_url
per language
Multi-modal — image + reference video + reference audio together
Seedance 2.0 Pro
Up to 9 image refs, 3 video refs (2–15s each), 3 audio refs
Brand-consistent narrative with character ref + scene ref + voice ref
Seedance 2.0 Pro
Image holds identity, video holds scene, audio holds voice
Default if unspecified
HappyHorse 1.0 I2V
Best all-round quality + native audio
The agent reads this table, classifies the user's intent, and picks the matching subsection below.
Prerequisites
RunComfy CLI
npm i -g @runcomfy/cli
RunComfy account
runcomfy login
opens a browser device-code flow.
CI / containers
— set
RUNCOMFY_TOKEN=
.
A source image URL
— JPEG/PNG/WebP, min 300px, ≤10MB; aspect 1:2.5 to 2.5:1 (HappyHorse) — other models have similar specs.
Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation
Model
:
happyhorse/happyhorse-1-0/image-to-video
·
Arena rank

1 (Elo 1392)

Schema
Field
Type
Required
Default
Notes
image_url
string
yes
JPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB.
prompt
string
yes
≤5000 non-CJK or 2500 CJK chars.
Motion / camera / lighting
description.
resolution
enum
no
1080P
720P
or
1080P
.
duration
int
no
5
3–15 seconds.
seed
int
no
0
Reuse for variant comparisons.
watermark
bool
no
true
Provider watermark toggle.
Output aspect = input aspect. No independent reframing.
Invoke
runcomfy run happyhorse/happyhorse-1-0/image-to-video
\
--input
'{
"image_url": "https://.../portrait.jpg",
"prompt": "Gentle camera drift around the subject'
\
'
's face, subtle breathing motion, identity-stable features, soft natural light."
}'
\
--output-dir
<
absolute/path
>
Prompting tips
Lead with motion verbs
"drift", "dolly in", "orbit", "tilt up", "reveal", "blink", "breathe". Front-load what's MOVING.
Don't restate the image
— the model sees it. Focus tokens on what changes.
Preservation goals explicit
"identity-stable features", "packaging unchanged", "background geometry stable".
Lighting evolution
"rim light intensifying", "shadows shortening as camera rises".
One beat per clip
— single primary motion (orbit OR dolly OR tilt OR character action).
Route 2: Wan 2.7 +
audio_url
— when the user has a custom voiceover
Model
:
wan-ai/wan-2-7/text-to-video
(NOT
/image-to-video
— Wan 2.7's t2v endpoint accepts an
audio_url
that drives lip-sync)
Note on i2v with Wan 2.7
Wan 2.7's primary i2v animation isn't on a dedicated endpoint here. For pure i2v (image animated by motion prompt only), prefer
HappyHorse i2v
. Use Wan 2.7 specifically when the user has a custom audio track they want lip-synced to a generated talking-head clip.
Schema (Wan 2.7 t2v with audio)
Field
Type
Required
Default
Notes
prompt
string
yes
Up to ~5000 chars. Describe the talking-head shot: framing, lighting, motion.
audio_url
string
yes (for lip-sync)
WAV/MP3, 3–30s, ≤15MB.
Drives lip-sync.
aspect_ratio
enum
no
16:9
16:9
,
9:16
,
1:1
,
4:3
,
3:4
.
resolution
enum
no
1080p
720p
or
1080p
.
duration
enum
no
5
2–15 (whole seconds). Match your audio length.
negative_prompt
string
no
Concrete issues to avoid (e.g. "no subtitles, no flicker").
seed
int
no
Reproducibility.
Invoke
runcomfy run wan-ai/wan-2-7/text-to-video
\
--input
'{
"prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
"audio_url": "https://.../voiceover-en.mp3",
"duration": 12,
"aspect_ratio": "9:16"
}'
\
--output-dir
<
absolute/path
>
Prompting tips
Describe the talking-head shot
— framing, lighting, lens feel. The audio drives the lip-sync; the prompt builds the visual frame around it.
Match
duration
to audio length
— clip will be silent past the audio if too long.
Use
negative_prompt
for issues
:
"no subtitles, no flicker, no distorted hands"
.
For multi-language dubs
— same prompt, swap
audio_url
per call. Lock seed for visual consistency across languages.
Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)
Model
:
bytedance/seedance-v2/pro
Use when the user wants a single clip that combines: a
subject image
+
scene from a reference video
+
voice tone from a reference audio
.
Schema (Seedance 2.0 Pro, i2v-relevant fields)
Field
Type
Required
Default
Notes
prompt
string
yes
CN ≤500 chars OR EN ≤1000 words.
image_url
array
yes (for i2v)
[]
0–9 images.
First is the primary subject.
video_url
array
no
[]
0–3 reference clips (MP4/MOV), 2–15s each.
audio_url
array
no
[]
0–3 reference audio (WAV/MP3), 2–15s, < 15MB each.
aspect_ratio
enum
no
adaptive
adaptive
,
16:9
,
9:16
,
4:3
,
3:4
,
1:1
,
21:9
.
duration
int
no
5
4–15 (whole seconds).
resolution
enum
no
720p
480p
or
720p
.
generate_audio
bool
no
true
In-pass synchronized speech / SFX / music.
seed
int
no
Reproducibility.
Invoke
runcomfy run bytedance/seedance-v2/pro
\
--input
'{
"prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-tone.mp3"],
"duration": 8
}'
\
--output-dir
<
absolute/path
>
Prompting tips
Image vs text division
— use
image_url
for what must stay stable (face, costume, brand); use
prompt
for what should evolve (action, mood, lighting).
Number the refs
in the prompt:
"subject from image 1, lighting from video 1, voice from audio 1"
. Seedance routes cues correctly.
Reference media specs
— videos / audio must be 2–15s; audio < 15MB.
Don't mix radically different aesthetics
— if image 1 is a watercolor and video 1 is photoreal, output drifts.
Limitations
Each route inherits its model's limits.
HappyHorse: 15s cap, output aspect = input aspect. Wan 2.7: 15s cap, audio 3–30s/15MB. Seedance: 720p ceiling on this template, 15s cap.
No multi-route blending.
This skill picks one model per call. If the user wants HappyHorse animation + Wan-style lip-sync in the same clip, that's two calls + a stitch (out of scope here).
Brand-specific overrides
— if the user named a specific model variant not listed (e.g. Wan 2.6, Seedance 1.5), route to the corresponding brand skill (
wan-2-7
,
seedance-v2
) instead of forcing it through here.
Exit codes
code
meaning
0
success
64
bad CLI args
65
bad input JSON / schema mismatch
69
upstream 5xx
75
retryable: timeout / 429
77
not signed in or token rejected
Full reference:
docs.runcomfy.com/cli/troubleshooting
.
How it works
The skill picks one of HappyHorse 1.0 I2V / Wan 2.7 t2v+audio / Seedance 2.0 Pro based on user intent and invokes
runcomfy run
with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
Security & Privacy
Token storage
:
runcomfy login
writes the API token to
~/.config/runcomfy/token.json
with mode 0600 (owner-only read/write). Set
RUNCOMFY_TOKEN
env var to bypass the file entirely in CI / containers.
Input boundary
the user prompt is passed as a JSON string to the CLI via
--input
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content
image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints
only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap
the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
返回排行榜