GPT Image Edit — Pro Pack on RunComfy

runcomfy.com

·

Edit endpoint

·

Text-to-image sibling

·

GitHub

OpenAI

GPT Image 2 —

/edit

endpoint

(ChatGPT Images 2.0 image-to-image) on the

RunComfy Model API

. Strongest in its class at preserving identity through targeted edits and rewriting embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic).

npx skills

add

agentspace-so/runcomfy-skills

--skill

gpt-image-edit

-g

When to pick this model (vs siblings)

You want

Use

Edit multilingual / embedded text in image

GPT Image Edit

Identity preservation through translated headline variants

GPT Image Edit

Layout-precise edit (move headline, swap CTA, etc.)

GPT Image Edit

Up to 10 reference images

GPT Image Edit

Batch up to 20 images consistently

Nano Banana Edit

Single-shot precise local edit, source-fidelity-first

Flux Kontext

Generate from scratch with GPT Image 2

sibling

gpt-image-2

skill

Batch SKU galleries with stable identity

Nano Banana Edit

Prerequisites

RunComfy CLI

—

npm i -g @runcomfy/cli

RunComfy account

—

runcomfy login

opens a browser device-code flow.

CI / containers

— set

RUNCOMFY_TOKEN=

instead of

runcomfy login

.

Endpoints + input schema

openai/gpt-image-2/edit

Field

Type

Required

Default

Notes

prompt

string

yes

—

Edit instruction. Lead with preservation, end with the change.

images

string[]

yes

—

Up to 10

publicly-fetchable HTTPS URLs. First is primary; rest are auxiliary.

size

enum

no

auto

(preserve input),

1024_1024

(1:1),

1024_1536

(2:3 portrait),

1536_1024

(3:2 landscape).

size=auto

preserves the input ratio — strongly recommended unless the edit explicitly changes framing.

How to invoke

Single-ref preservation edit:

runcomfy run openai/gpt-image-2/edit

\

--input

'{

"prompt": "Keep the person'

\

'

's face, pose, and brand mark unchanged. Replace the background with a soft warm-grey studio sweep and a gentle floor shadow.",

"images": ["https://.../portrait.jpg"]

}'

\

--output-dir

<

absolute/path

>

Multilingual text rewrite (preserve everything except the headline):

runcomfy run openai/gpt-image-2/edit

\

--input

'{

"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight as before.",

"images": ["https://.../poster-en.jpg"]

}'

\

--output-dir

<

absolute/path

>

Multi-ref composition:

runcomfy run openai/gpt-image-2/edit

\

--input

'{

"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity (face, pose, clothing) unchanged.",

"images": ["https://.../subject.jpg", "https://.../room.jpg"]

}'

\

--output-dir

<

absolute/path

>

Prompting — what actually works

Lead with preservation goals.

Always:

"Keep [face / pose / clothing / brand / framing] unchanged."

Then state the change. The model honors what's stated up front.

Multilingual text — quote the characters, name the script.

"the headline reads \"コーヒー\" in bold Japanese kana"

,

"the label says \"АРОМА\" in Cyrillic, white on black"

,

"the right-margin caption reads \"تخفيض\" in Arabic right-to-left"

. Don't paraphrase — quote.

Directional language for spatial edits.

Concrete spatial scopes work:

"move the headline from top-right to bottom-center"

,

"remove the leftmost object only"

,

"replace the watermark in the bottom-right corner"

.

Multi-ref numbering.

When passing multiple

images

, refer to them by number:

"subject from image 1, lighting from image 2, color palette from image 3"

. The model routes cues correctly.

Use

size: "auto"

to preserve input ratio.

Only override when the edit explicitly changes framing (e.g. cropping a 16:9 to 1:1).

Anti-patterns:

Long compound edit instructions ("change A and B and C and D") → drift increases per added scope.

Missing preservation goals → model subtly rewrites the face / brand / framing.

Paraphrasing in-image text instead of quoting it → text comes out different.

Asking for

size

outside the 3 fixed values +

auto

→ 422.

Where it shines

Use case

Why GPT Image Edit

Multilingual ad localization

One source asset → many language variants of the same headline

Brand-safe headline / CTA swaps

Layout precision + preservation language hold the rest stable

Multi-ref composition (subject from one, scene from another)

Numbered refs route cues correctly

Layout-precise repositioning

Directional language ("top-right to bottom-center") honored

Identity preservation across signage edits

Strongest in class for face / brand preservation through targeted edits

Sample prompts (verified to produce strong results)

Background swap with full preservation (page example):

Turn the background into a bright minimal white-to-soft-gray studio

sweep with gentle floor shadow; add a large headline in-image that

reads "OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;

keep the main person or product, pose, and face identity unchanged

Multilingual variant:

Keep the photograph, layout, lighting, and brand mark exactly as in the

input. Replace only the in-image headline.

The new headline reads "コーヒー" in bold Japanese kana, same position

and font weight as before.

Multi-ref composition:

Compose subject from image 1 into the kitchen from image 2.

Match the warm window light and color palette of image 2.

Keep subject identity (face, pose, clothing) from image 1 unchanged.

Limitations

size

3 fixed values +

auto

— anything else 422s.

images

up to 10

— first is primary, rest are auxiliary cues.

Long compound prompts drift

— split into multiple passes when needed.

For batch consistency across many SKU images, Nano Banana Edit (up to 20) is better.

Photorealism on portraits

— Nano Banana Pro wins head-to-head.

Exit codes

code

meaning

0

success

64

bad CLI args

65

bad input JSON / schema mismatch

69

upstream 5xx

75

retryable: timeout / 429

77

not signed in or token rejected

Full reference:

docs.runcomfy.com/cli/troubleshooting

.

How it works

The skill invokes

runcomfy run openai/gpt-image-2/edit

with a JSON body matching the schema. The CLI POSTs to

https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/edit

, polls the request, fetches the result, and downloads any

.runcomfy.net

/

.runcomfy.com

URL into

--output-dir

.

Ctrl-C

cancels the remote request before exit.

Security & Privacy

Token storage

:

runcomfy login

writes the API token to

~/.config/runcomfy/token.json

with mode 0600 (owner-only read/write). Set

RUNCOMFY_TOKEN

env var to bypass the file entirely in CI / containers.

Input boundary

the user prompt is passed as a JSON string to the CLI via

--input

. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.

Third-party content

image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.

Outbound endpoints

only
model-api.runcomfy.net
(request submission) and
*.runcomfy.net
/
*.runcomfy.com
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

安装