Wan 2.7 — Pro Pack on RunComfy

runcomfy.com

·

Text-to-video

·

GitHub

Wan-AI's

Wan 2.7

— flagship video model with multi-reference conditioning and audio-driven lip-sync — hosted on the

RunComfy Model API

.

npx skills

add

agentspace-so/runcomfy-skills

--skill

wan-2-7

-g

When to pick this model (vs siblings)

You want

Use

Lip-sync video to an audio track you supply

Wan 2.7

(

audio_url

)

Multi-reference fine motion control

Wan 2.7

Smooth transitions, accurate motion physics

Wan 2.7

Currently-#1 blind-vote video model

HappyHorse 1.0

Multi-modal cinematic with image+video+audio refs + in-pass voice generation

Seedance 2.0 Pro

Cinematic motion editing on existing footage

Kling Video O1

Ultra-fast iteration

LTX 2

If the user said "Wan" / "Wan 2.7" / "wan-ai" / "alibaba video" explicitly, route here regardless.

Prerequisites

RunComfy CLI

—

npm i -g @runcomfy/cli

RunComfy account

—

runcomfy login

opens a browser device-code flow.

CI / containers

— set

RUNCOMFY_TOKEN=

instead of

runcomfy login

.

Endpoints + input schema

wan-ai/wan-2-7/text-to-video

Field

Type

Required

Default

Notes

prompt

string

yes

—

Up to ~5000 chars / ~1500 tokens.

audio_url

string

no

—

WAV/MP3, 3–30s, ≤15MB.

Drives lip-sync.

Omit → background music auto-generated.

aspect_ratio

enum

no

16:9

,

9:16

,

1:1

,

4:3

,

3:4

.

resolution

enum

no

1080p

720p

or

1080p

.

duration

enum

no

5

2–15 (whole seconds).

negative_prompt

string

no

—

Up to 500 chars. Concrete issues to avoid.

enable_prompt_expansion

bool

no

true

Auto-rewrites short prompts. Disable for literal control.

seed

int

no

—

0..2^31-1. Reuse for variants.

How to invoke

Default (5s 1080p 16:9, prompt-expanded):

runcomfy run wan-ai/wan-2-7/text-to-video

\

--input

'{"prompt": ""}'

\

--output-dir

<

absolute/path

>

Audio-driven lip-sync (your own track):

runcomfy run wan-ai/wan-2-7/text-to-video

\

--input

'{

"prompt": "Medium close-up of the spokesperson, warm key light, locked tripod, slight breathing motion.",

"audio_url": "https://.../voiceover.mp3",

"duration": 12,

"aspect_ratio": "9:16"

}'

\

--output-dir

<

absolute/path

>

Literal control (no auto-expansion):

runcomfy run wan-ai/wan-2-7/text-to-video

\

--input

'{

"prompt": "",

"enable_prompt_expansion": false,

"negative_prompt": "no subtitles, no flicker, no distorted hands"

}'

\

--output-dir

<

absolute/path

>

Prompting — what actually works

Camera + motion in plain English.

"Slow dolly in", "locked tripod, low angle", "handheld follow", "crane move from above". Front-load the shot.

One primary action per clip.

Don't pile up multiple competing actions. Pick the beat: "she turns, then smiles" not "she turns AND smiles AND a bus passes AND...".

Use

negative_prompt

for concrete issues.

Good: "no subtitles, no watermark, no flicker". Bad (vague): "no bad lighting".

Prompt expansion is on by default.

Short prompts get auto-rewritten by the model. For terse / literal prompts (e.g. brand-strict ad copy), disable with

enable_prompt_expansion: false

.

Audio specs matter.

audio_url

must be 3–30s, ≤15MB, WAV/MP3. Out-of-range files reject. Match audio length to clip duration.

Iterate seeds.

Reuse the same seed when you want consistent output across variants of the same prompt. Change seed for genuine variety.

Anti-patterns:

Static-frame descriptions → motion will be vague.

Vague negatives ("no bad colors") → ignored.

Audio outside the 3–30s / 15MB / WAV-MP3 spec → rejected.

Prompts > 5000 chars / 1500 tokens → degraded output.

Where it shines

Use case

Why Wan 2.7

Lip-synced ads with custom voiceover

audio_url

accepts your track

Multi-language dub variants

Same prompt, different

audio_url

per language

Multi-reference motion control

Up to 5 reference media (image / video / voice)

Smooth transitions + motion physics

Strong physics-aware motion priors

Negative-prompted clean output

Targeted issue exclusion

Sample prompts (verified to produce strong results)

Page example (product showcase):

Cinematic medium shot of a product on a marble surface, soft studio

lighting, slow subtle camera push-in, shallow depth of field, premium

commercial look, crisp 1080p detail

Lip-synced spokesperson (with

audio_url

):

Medium close-up of a confident spokesperson in a softly-lit recording

booth, leaning slightly toward the camera, locked tripod, shallow depth

of field, warm key light from camera-left.

Vertical platform-native:

9:16 vertical short. A barista pulls a single espresso shot, steam

rising into morning sun, rich crema slowly forming. Close-up handheld,

shallow DOF, warm cafe ambience.

Limitations

Duration cap 15s.

For longer narratives, stitch multiple calls.

No native 4K

— 1080p ceiling.

Aspect ratios

— only the 5 documented values.

Audio specs

— 3–30s, ≤15MB, WAV/MP3 only.

Reference media cap 5

(image + video + voice combined).

For in-pass voice generation (no separate audio track), use Seedance 2.0 Pro

— Wan accepts audio rather than generating it.

Exit codes

code

meaning

0

success

64

bad CLI args

65

bad input JSON / schema mismatch

69

upstream 5xx

75

retryable: timeout / 429

77

not signed in or token rejected

Full reference:

docs.runcomfy.com/cli/troubleshooting

.

How it works

The skill invokes

runcomfy run wan-ai/wan-2-7/text-to-video

with a JSON body matching the schema. The CLI POSTs to

https://model-api.runcomfy.net/v1/models/wan-ai/wan-2-7/text-to-video

, polls the request, fetches the result, and downloads any

.runcomfy.net

/

.runcomfy.com

URL into

--output-dir

.

Ctrl-C

cancels the remote request before exit.

Security & Privacy

Token storage

:

runcomfy login

writes the API token to

~/.config/runcomfy/token.json

with mode 0600 (owner-only read/write). Set

RUNCOMFY_TOKEN

env var to bypass the file entirely in CI / containers.

Input boundary

the user prompt is passed as a JSON string to the CLI via

--input

. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.

Third-party content

image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.

Outbound endpoints

only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

安装