Then state the change in one clean sentence. Models honor what's stated up front; tail-end preservations get ignored.
Localize with spatial language.
"background only", "the left object", "the upper-right corner", "above the headline" — concrete spatial scopes are honored. "make it more X" is vague and drifts.
Batch consistency
— when editing a series, lock
aspect_ratio
and
resolution
. Use the same prompt grammar across the batch so each output reads as a sibling, not a remix.
Iterate small.
If a one-pass edit drifts, split into two: pass 1 changes background only, pass 2 swaps the subject's outfit. Cleaner edits, same total cost (assuming similar resolution).
Multi-image variation
— pass up to 20 inputs to get a coherent batch. Useful for SKU galleries, A/B testing, character sheet variations.
Anti-patterns:
Long compound instructions ("change A and B and C and D") — drift increases per added scope.
Edit instructions written in passive voice ("the background should be changed") — be imperative.
Missing preservation goals — model will subtly rewrite the face / brand.
Aspect ratios that don't match input — causes crops or stretches.
Where it shines
Use case
Why Nano Banana Edit
SKU gallery — same product on different backgrounds
Batch of 20, identity-preserved, framing locked
Influencer / spokesperson background swaps
Strong identity preservation across edits
Localized object removal / addition
Spatial language honored
A/B variants for ad creative
Seed lock + multiple
number_of_images
Brand-asset relocalization
Same composition with text / palette swap
Sample prompts (verified to produce strong results)
Background swap (page example):
Keep the subject identity unchanged. Convert the background into a rainy
neon cyberpunk street.
Targeted text replacement:
Keep the bottle, label, and lighting exactly as in the input.
Replace only the brand text on the label from "ALPHA" to "AURA",
same font weight, centered, white on black.
Multi-image batch consistency:
For each input image: keep the subject's pose and identity unchanged.
Convert the background to a soft warm-grey studio sweep with subtle
floor shadow. Center the subject at the same fraction of frame as the
input.
Limitations
1–20 input images per call
— the first is treated as primary; the rest provide auxiliary cues.
1–4 outputs per call.
Long compound prompts drift
— split into multiple passes.
Web search adds latency + cost
— only enable on demand.
For multilingual in-image text edits, GPT Image 2 edit wins.
Exit codes
code
meaning
0
success
64
bad CLI args
65
bad input JSON / schema mismatch
69
upstream 5xx
75
retryable: timeout / 429
77
not signed in or token rejected
Full reference:
docs.runcomfy.com/cli/troubleshooting
.
How it works
The skill invokes
runcomfy run google/nano-banana-2/edit
with a JSON body matching the schema. The CLI POSTs to
, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
Security & Privacy
Token storage
:
runcomfy login
writes the API token to
~/.config/runcomfy/token.json
with mode 0600 (owner-only read/write). Set
RUNCOMFY_TOKEN
env var to bypass the file entirely in CI / containers.
Input boundary
the user prompt is passed as a JSON string to the CLI via
--input
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content
image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints
only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap
the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.