seedance-v2

安装量: 100K
排名: #138

安装

npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill seedance-v2
Seedance 2.0 Pro — Pro Pack on RunComfy
runcomfy.com
·
Seedance 2.0 Pro
·
GitHub
ByteDance
Seedance 2.0 Pro
— multimodal cinematic video generator with native lip-synced audio — hosted on the
RunComfy Model API
.
npx skills
add
agentspace-so/runcomfy-skills
--skill
seedance-v2
-g
When to pick this model (vs siblings)
Seedance 2.0 Pro's distinct strength is
multi-modal cinematic short-form
combine character images + scene videos + reference audio into one coherent shot. Pick it when
fidelity to a reference identity / scene matters and you want native lip-sync
.
You want
Use
Lip-synced spokesperson / dialogue ad
Seedance 2.0 Pro
Multi-modal references (image + video + audio)
Seedance 2.0 Pro
Brand-consistent multi-language narrative
Seedance 2.0 Pro
Currently-#1 blind-vote video quality
HappyHorse 1.0
Audio-driven lip-sync from your own track
Wan 2.7 (
audio_url
)
Motion editing on existing footage
Kling Video O1
Ultra-fast iteration
LTX 2
If the user said "Seedance" / "Seedance 2" / "ByteDance video" explicitly, route here regardless.
Prerequisites
RunComfy CLI
npm i -g @runcomfy/cli
RunComfy account
runcomfy login
opens a browser device-code flow.
CI / containers
— set
RUNCOMFY_TOKEN=
instead of
runcomfy login
.
Endpoints + input schema
bytedance/seedance-v2/pro
Field
Type
Required
Default
Notes
prompt
string
yes
CN ≤ 500 chars OR EN ≤ 1000 words.
image_url
array
no
[]
0–9 references (JPEG/PNG/WebP/BMP/TIFF/GIF).
video_url
array
no
[]
0–3 clips (MP4/MOV), 2–15s each.
audio_url
array
no
[]
0–3 audio refs (WAV/MP3), 2–15s, < 15MB each.
aspect_ratio
enum
no
adaptive
adaptive
,
16:9
,
9:16
,
4:3
,
3:4
,
1:1
,
21:9
.
duration
int
no
5
4–15 (whole seconds).
resolution
enum
no
720p
480p
or
720p
.
generate_audio
bool
no
true
In-pass synchronized speech / SFX / music.
seed
int
no
Reproducibility.
How to invoke
Default (text only, 5s, 720p with audio):
runcomfy run bytedance/seedance-v2/pro
\
--input
'{"prompt": ""}'
\
--output-dir
<
absolute/path
>
Lip-synced ad with character reference (image-stable, text-evolves):
runcomfy run bytedance/seedance-v2/pro
\
--input
'{
"prompt": "Medium close-up. The woman explains today'
\
'
's special in a warm friendly tone, slow push-in, soft window light, gentle cafe ambience.",
"image_url": ["https://.../barista-headshot.jpg"],
"duration": 8,
"aspect_ratio": "9:16"
}'
\
--output-dir
<
absolute/path
>
Multi-modal (image + video + audio refs):
runcomfy run bytedance/seedance-v2/pro
\
--input
'{
"prompt": "Subject from image 1 walks through the café from video 1, voice tone matches audio 1.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-ref.mp3"]
}'
\
--output-dir
<
absolute/path
>
The CLI submits, polls, fetches the result, downloads
*.runcomfy.net
/
*.runcomfy.com
URLs into
--output-dir
.
Prompting — what actually works
Image vs text division.
This is the single most important rule. Stable identity (face, costume, brand mark, logo) → put in
image_url
. Evolving narrative (action, mood, lighting, camera) → put in
prompt
. Trying to verbally describe a face in detail wastes tokens and produces drift.
Camera + motion in plain language.
"Medium close-up", "slow push-in", "handheld follow", "locked-off wide" all work as directives. Combine:
"Medium close-up. Slow push-in over 3 seconds. Handheld, slight breathing motion."
Audio direction with
generate_audio: true
— say the tone:
"warm friendly conversational"
,
"calm instructional"
,
"crisp newsroom delivery"
. For ambient:
"gentle cafe chatter, distant traffic, no foreground music"
.
Reference media specs
— videos must be 2–15s; audio must be ≤15MB and 2–15s. Out-of-range files reject. Match aspect ratio of refs to your output to avoid crops.
Anti-patterns:
Mixing radically different aesthetic refs (watercolor + photoreal) → confuses.
Conflicting style cues in prompt → simplify by removing contradictions.
Trying to describe stable identity verbally → use
image_url
instead.
Asking for >15s clips → 422; segment into multiple calls.
Where it shines
Use case
Why Seedance 2.0 Pro
Spokesperson / dialogue ads
Native in-pass lip-sync, no separate TTS step
Brand-consistent multi-language narratives
Image refs hold identity; text drives translation
Cinematic short-form film previs
Camera-shot grammar + multi-modal refs
Ad creatives with reference music / VO tone
Audio refs guide voice / mood without locking lip-sync
Reproducible variant testing
Seed control + fixed schema
Sample prompts (verified to produce strong results)
Default playground example:
Golden hour on a quiet cafe terrace: a barista wipes the counter, then
looks up and explains today's special in a friendly tone, natural
lip-sync. Medium close-up, slow push-in; warm side light, soft bokeh
through glass, gentle cafe ambience and subtle film grain.
Multi-modal lip-sync (text + image):
Same person as image 1 in a softly-lit recording booth, leaning into
the mic, says: "We just shipped the biggest update of the year."
Calm conversational tone. Medium close-up, locked tripod, shallow DOF,
warm key light from camera-left.
Limitations
Duration 4–15s
— no longer clips on this endpoint.
Resolution ceiling 720p
on the playground variant.
Reference media specs
— videos / audio must be 2–15s; audio < 15MB.
Lip-sync quality
— depends on prompt clarity; not guaranteed perfect under all conditions.
No
@
-syntax for character binding
— relies on image refs + prompt alignment.
Exit codes
code
meaning
0
success
64
bad CLI args
65
bad input JSON / schema mismatch
69
upstream 5xx
75
retryable: timeout / 429
77
not signed in or token rejected
Full reference:
docs.runcomfy.com/cli/troubleshooting
.
How it works
The skill invokes
runcomfy run bytedance/seedance-v2/pro
with a JSON body matching the schema. The CLI POSTs to
https://model-api.runcomfy.net/v1/models/bytedance/seedance-v2/pro
, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
Security & Privacy
Token storage
:
runcomfy login
writes the API token to
~/.config/runcomfy/token.json
with mode 0600 (owner-only read/write). Set
RUNCOMFY_TOKEN
env var to bypass the file entirely in CI / containers.
Input boundary
the user prompt is passed as a JSON string to the CLI via
--input
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content
image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints
only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap
the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
返回排行榜