gpt-image-edit

安装量: 106.1K
排名: #126

安装

npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill gpt-image-edit
GPT Image Edit — Pro Pack on RunComfy
runcomfy.com
·
Edit endpoint
·
Text-to-image sibling
·
GitHub
OpenAI
GPT Image 2 —
/edit
endpoint
(ChatGPT Images 2.0 image-to-image) on the
RunComfy Model API
. Strongest in its class at preserving identity through targeted edits and rewriting embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic).
npx skills
add
agentspace-so/runcomfy-skills
--skill
gpt-image-edit
-g
When to pick this model (vs siblings)
You want
Use
Edit multilingual / embedded text in image
GPT Image Edit
Identity preservation through translated headline variants
GPT Image Edit
Layout-precise edit (move headline, swap CTA, etc.)
GPT Image Edit
Up to 10 reference images
GPT Image Edit
Batch up to 20 images consistently
Nano Banana Edit
Single-shot precise local edit, source-fidelity-first
Flux Kontext
Generate from scratch with GPT Image 2
sibling
gpt-image-2
skill
Batch SKU galleries with stable identity
Nano Banana Edit
Prerequisites
RunComfy CLI
npm i -g @runcomfy/cli
RunComfy account
runcomfy login
opens a browser device-code flow.
CI / containers
— set
RUNCOMFY_TOKEN=
instead of
runcomfy login
.
Endpoints + input schema
openai/gpt-image-2/edit
Field
Type
Required
Default
Notes
prompt
string
yes
Edit instruction. Lead with preservation, end with the change.
images
string[]
yes
Up to 10
publicly-fetchable HTTPS URLs. First is primary; rest are auxiliary.
size
enum
no
auto
auto
(preserve input),
1024_1024
(1:1),
1024_1536
(2:3 portrait),
1536_1024
(3:2 landscape).
size=auto
preserves the input ratio — strongly recommended unless the edit explicitly changes framing.
How to invoke
Single-ref preservation edit:
runcomfy run openai/gpt-image-2/edit
\
--input
'{
"prompt": "Keep the person'
\
'
's face, pose, and brand mark unchanged. Replace the background with a soft warm-grey studio sweep and a gentle floor shadow.",
"images": ["https://.../portrait.jpg"]
}'
\
--output-dir
<
absolute/path
>
Multilingual text rewrite (preserve everything except the headline):
runcomfy run openai/gpt-image-2/edit
\
--input
'{
"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight as before.",
"images": ["https://.../poster-en.jpg"]
}'
\
--output-dir
<
absolute/path
>
Multi-ref composition:
runcomfy run openai/gpt-image-2/edit
\
--input
'{
"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity (face, pose, clothing) unchanged.",
"images": ["https://.../subject.jpg", "https://.../room.jpg"]
}'
\
--output-dir
<
absolute/path
>
Prompting — what actually works
Lead with preservation goals.
Always:
"Keep [face / pose / clothing / brand / framing] unchanged."
Then state the change. The model honors what's stated up front.
Multilingual text — quote the characters, name the script.
"the headline reads \"コーヒー\" in bold Japanese kana"
,
"the label says \"АРОМА\" in Cyrillic, white on black"
,
"the right-margin caption reads \"تخفيض\" in Arabic right-to-left"
. Don't paraphrase — quote.
Directional language for spatial edits.
Concrete spatial scopes work:
"move the headline from top-right to bottom-center"
,
"remove the leftmost object only"
,
"replace the watermark in the bottom-right corner"
.
Multi-ref numbering.
When passing multiple
images
, refer to them by number:
"subject from image 1, lighting from image 2, color palette from image 3"
. The model routes cues correctly.
Use
size: "auto"
to preserve input ratio.
Only override when the edit explicitly changes framing (e.g. cropping a 16:9 to 1:1).
Anti-patterns:
Long compound edit instructions ("change A and B and C and D") → drift increases per added scope.
Missing preservation goals → model subtly rewrites the face / brand / framing.
Paraphrasing in-image text instead of quoting it → text comes out different.
Asking for
size
outside the 3 fixed values +
auto
→ 422.
Where it shines
Use case
Why GPT Image Edit
Multilingual ad localization
One source asset → many language variants of the same headline
Brand-safe headline / CTA swaps
Layout precision + preservation language hold the rest stable
Multi-ref composition (subject from one, scene from another)
Numbered refs route cues correctly
Layout-precise repositioning
Directional language ("top-right to bottom-center") honored
Identity preservation across signage edits
Strongest in class for face / brand preservation through targeted edits
Sample prompts (verified to produce strong results)
Background swap with full preservation (page example):
Turn the background into a bright minimal white-to-soft-gray studio
sweep with gentle floor shadow; add a large headline in-image that
reads "OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged
Multilingual variant:
Keep the photograph, layout, lighting, and brand mark exactly as in the
input. Replace only the in-image headline.
The new headline reads "コーヒー" in bold Japanese kana, same position
and font weight as before.
Multi-ref composition:
Compose subject from image 1 into the kitchen from image 2.
Match the warm window light and color palette of image 2.
Keep subject identity (face, pose, clothing) from image 1 unchanged.
Limitations
size
3 fixed values +
auto
— anything else 422s.
images
up to 10
— first is primary, rest are auxiliary cues.
Long compound prompts drift
— split into multiple passes when needed.
For batch consistency across many SKU images, Nano Banana Edit (up to 20) is better.
Photorealism on portraits
— Nano Banana Pro wins head-to-head.
Exit codes
code
meaning
0
success
64
bad CLI args
65
bad input JSON / schema mismatch
69
upstream 5xx
75
retryable: timeout / 429
77
not signed in or token rejected
Full reference:
docs.runcomfy.com/cli/troubleshooting
.
How it works
The skill invokes
runcomfy run openai/gpt-image-2/edit
with a JSON body matching the schema. The CLI POSTs to
https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/edit
, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
Security & Privacy
Token storage
:
runcomfy login
writes the API token to
~/.config/runcomfy/token.json
with mode 0600 (owner-only read/write). Set
RUNCOMFY_TOKEN
env var to bypass the file entirely in CI / containers.
Input boundary
the user prompt is passed as a JSON string to the CLI via
--input
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content
image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints
only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap
the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
返回排行榜