安装
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill wan-2-7
复制
Wan 2.7 — Pro Pack on RunComfy
runcomfy.com
·
Text-to-video
·
GitHub
Wan-AI's
Wan 2.7
— flagship video model with multi-reference conditioning and audio-driven lip-sync — hosted on the
RunComfy Model API
.
npx skills
add
agentspace-so/runcomfy-skills
--skill
wan-2-7
-g
When to pick this model (vs siblings)
You want
Use
Lip-sync video to an audio track you supply
Wan 2.7
(
audio_url
)
Multi-reference fine motion control
Wan 2.7
Smooth transitions, accurate motion physics
Wan 2.7
Currently-#1 blind-vote video model
HappyHorse 1.0
Multi-modal cinematic with image+video+audio refs + in-pass voice generation
Seedance 2.0 Pro
Cinematic motion editing on existing footage
Kling Video O1
Ultra-fast iteration
LTX 2
If the user said "Wan" / "Wan 2.7" / "wan-ai" / "alibaba video" explicitly, route here regardless.
Prerequisites
RunComfy CLI
—
npm i -g @runcomfy/cli
RunComfy account
—
runcomfy login
opens a browser device-code flow.
CI / containers
— set
RUNCOMFY_TOKEN=
instead of
runcomfy login
.
Endpoints + input schema
wan-ai/wan-2-7/text-to-video
Field
Type
Required
Default
Notes
prompt
string
yes
—
Up to ~5000 chars / ~1500 tokens.
audio_url
string
no
—
WAV/MP3, 3–30s, ≤15MB.
Drives lip-sync.
Omit → background music auto-generated.
aspect_ratio
enum
no
16:9
16:9
,
9:16
,
1:1
,
4:3
,
3:4
.
resolution
enum
no
1080p
720p
or
1080p
.
duration
enum
no
5
2–15 (whole seconds).
negative_prompt
string
no
—
Up to 500 chars. Concrete issues to avoid.
enable_prompt_expansion
bool
no
true
Auto-rewrites short prompts. Disable for literal control.
seed
int
no
—
0..2^31-1. Reuse for variants.
How to invoke
Default (5s 1080p 16:9, prompt-expanded):
runcomfy run wan-ai/wan-2-7/text-to-video
\
--input
'{"prompt": ""}'
\
--output-dir
<
absolute/path
>
Audio-driven lip-sync (your own track):
runcomfy run wan-ai/wan-2-7/text-to-video
\
--input
'{
"prompt": "Medium close-up of the spokesperson, warm key light, locked tripod, slight breathing motion.",
"audio_url": "https://.../voiceover.mp3",
"duration": 12,
"aspect_ratio": "9:16"
}'
\
--output-dir
<
absolute/path
>
Literal control (no auto-expansion):
runcomfy run wan-ai/wan-2-7/text-to-video
\
--input
'{
"prompt": "",
"enable_prompt_expansion": false,
"negative_prompt": "no subtitles, no flicker, no distorted hands"
}'
\
--output-dir
<
absolute/path
>
Prompting — what actually works
Camera + motion in plain English.
"Slow dolly in", "locked tripod, low angle", "handheld follow", "crane move from above". Front-load the shot.
One primary action per clip.
Don't pile up multiple competing actions. Pick the beat: "she turns, then smiles" not "she turns AND smiles AND a bus passes AND...".
Use
negative_prompt
for concrete issues.
Good: "no subtitles, no watermark, no flicker". Bad (vague): "no bad lighting".
Prompt expansion is on by default.
Short prompts get auto-rewritten by the model. For terse / literal prompts (e.g. brand-strict ad copy), disable with
enable_prompt_expansion: false
.
Audio specs matter.
audio_url
must be 3–30s, ≤15MB, WAV/MP3. Out-of-range files reject. Match audio length to clip duration.
Iterate seeds.
Reuse the same seed when you want consistent output across variants of the same prompt. Change seed for genuine variety.
Anti-patterns:
Static-frame descriptions → motion will be vague.
Vague negatives ("no bad colors") → ignored.
Audio outside the 3–30s / 15MB / WAV-MP3 spec → rejected.
Prompts > 5000 chars / 1500 tokens → degraded output.
Where it shines
Use case
Why Wan 2.7
Lip-synced ads with custom voiceover
audio_url
accepts your track
Multi-language dub variants
Same prompt, different
audio_url
per language
Multi-reference motion control
Up to 5 reference media (image / video / voice)
Smooth transitions + motion physics
Strong physics-aware motion priors
Negative-prompted clean output
Targeted issue exclusion
Sample prompts (verified to produce strong results)
Page example (product showcase):
Cinematic medium shot of a product on a marble surface, soft studio
lighting, slow subtle camera push-in, shallow depth of field, premium
commercial look, crisp 1080p detail
Lip-synced spokesperson (with
audio_url
):
Medium close-up of a confident spokesperson in a softly-lit recording
booth, leaning slightly toward the camera, locked tripod, shallow depth
of field, warm key light from camera-left.
Vertical platform-native:
9:16 vertical short. A barista pulls a single espresso shot, steam
rising into morning sun, rich crema slowly forming. Close-up handheld,
shallow DOF, warm cafe ambience.
Limitations
Duration cap 15s.
For longer narratives, stitch multiple calls.
No native 4K
— 1080p ceiling.
Aspect ratios
— only the 5 documented values.
Audio specs
— 3–30s, ≤15MB, WAV/MP3 only.
Reference media cap 5
(image + video + voice combined).
For in-pass voice generation (no separate audio track), use Seedance 2.0 Pro
— Wan accepts audio rather than generating it.
Exit codes
code
meaning
0
success
64
bad CLI args
65
bad input JSON / schema mismatch
69
upstream 5xx
75
retryable: timeout / 429
77
not signed in or token rejected
Full reference:
docs.runcomfy.com/cli/troubleshooting
.
How it works
The skill invokes
runcomfy run wan-ai/wan-2-7/text-to-video
with a JSON body matching the schema. The CLI POSTs to
https://model-api.runcomfy.net/v1/models/wan-ai/wan-2-7/text-to-video
, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
Security & Privacy
Token storage
:
runcomfy login
writes the API token to
~/.config/runcomfy/token.json
with mode 0600 (owner-only read/write). Set
RUNCOMFY_TOKEN
env var to bypass the file entirely in CI / containers.
Input boundary
the user prompt is passed as a JSON string to the CLI via
--input
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content
image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints
only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap
the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
← 返回排行榜