Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Jimeng (即梦), Seedream (豆包) and Replicate providers.
Script Directory
Agent Execution
:
{baseDir}
= this SKILL.md file's directory
Script path =
{baseDir}/scripts/main.ts
Resolve
${BUN_X}
runtime: if
bun
installed →
bun
; if
npx
available →
npx -y bun
; else suggest installing bun
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer. Check EXTEND.md existence (priority: project → user):

macOS, Linux, WSL, Git Bash

test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project" test -f " ${XDG_CONFIG_HOME :- $HOME / .config} /baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "xdg" test -f " $HOME /.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"

PowerShell (Windows)

if

(

Test-Path

.

baoyu-skills/baoyu-image-gen/EXTEND

.

md

)

{

"project"

}

$xdg

=

if

(

$env

:XDG_CONFIG_HOME

)

{

$env

:XDG_CONFIG_HOME

}

else

{

"

$HOME

/.config"

}

if

(

Test-Path

"

$xdg

/baoyu-skills/baoyu-image-gen/EXTEND.md"

)

{

"xdg"

}

if

(

Test-Path

"

$HOME

/.baoyu-skills/baoyu-image-gen/EXTEND.md"

)

{

"user"

}

Result

Action

Found

Load, parse, apply settings. If

default_model.[provider]

is null → ask model only (Flow 2)

Not found

⛔ Run first-time setup (

references/config/first-time-setup.md

) → Save EXTEND.md → Then continue

CRITICAL

If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
Path
Location
.baoyu-skills/baoyu-image-gen/EXTEND.md
Project directory
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md
User home
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits Schema: references/config/preferences-schema.md Usage

Basic

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image cat.png

With aspect ratio

${BUN_X} { baseDir } /scripts/main.ts --prompt "A landscape" --image out.png --ar 16 :9

High quality

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image out.png --quality 2k

From prompt files

${BUN_X} { baseDir } /scripts/main.ts --promptfiles system.md content.md --image out.png

With reference images (Google, OpenAI, OpenRouter, or Replicate)

${BUN_X} { baseDir } /scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

With reference images (explicit provider/model)

${BUN_X} { baseDir } /scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

OpenRouter (recommended default model)

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image out.png --provider openrouter

OpenRouter with reference images

${BUN_X} { baseDir } /scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png

Specific provider

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image out.png --provider openai

DashScope (阿里通义万象)

${BUN_X} { baseDir } /scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)

${BUN_X} { baseDir } /scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报，包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872

DashScope legacy Qwen fixed-size model

${BUN_X} { baseDir } /scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928

Replicate (google/nano-banana-pro)

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Replicate with specific model

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Batch mode with saved prompt files

${BUN_X} { baseDir } /scripts/main.ts --batchfile batch.json

Batch mode with explicit worker count

${BUN_X}
{
baseDir
}
/scripts/main.ts
--batchfile
batch.json
--jobs
4
--json
Batch File Format
{
"jobs"
:
4
,
"tasks"
:
[
{
"id"
:
"hero"
,
"promptFiles"
:
[
"prompts/hero.md"
]
,
"image"
:
"out/hero.png"
,
"provider"
:
"replicate"
,
"model"
:
"google/nano-banana-pro"
,
"ar"
:
"16:9"
,
"quality"
:
"2k"
}
,
{
"id"
:
"diagram"
,
"promptFiles"
:
[
"prompts/diagram.md"
]
,
"image"
:
"out/diagram.png"
,
"ref"
:
[
"references/original.png"
]
}
]
}
Paths in
promptFiles
,
image
, and
ref
are resolved relative to the batch file's directory.
jobs
is optional (overridden by CLI
--jobs
). Top-level array format (without
jobs
wrapper) is also accepted.
Options
Option
Description
--prompt
,
-p
Prompt text
--promptfiles
Read prompt from files (concatenated)
--image
Output image path (required in single-image mode)
--batchfile
JSON batch file for multi-image generation
--jobs
Worker count for batch mode (default: auto, max from config, built-in default 10)
--provider google|openai|openrouter|dashscope|jimeng|seedream|replicate
Force provider (default: auto-detect)
--model
,
-m
Model ID (Google:
gemini-3-pro-image-preview
; OpenAI:
gpt-image-1.5
; OpenRouter:
google/gemini-3.1-flash-image-preview
; DashScope:
qwen-image-2.0-pro
)
--ar
Aspect ratio (e.g.,
16:9
,
1:1
,
4:3
)
--size
Size (e.g.,
1024x1024
)
--quality normal|2k
Quality preset (default:
2k
)
--imageSize 1K|2K|4K
Image size for Google/OpenRouter (default: from quality)
--ref
Reference images. Supported by Google multimodal, OpenAI GPT Image edits, OpenRouter multimodal models, and Replicate. Not supported by Jimeng or Seedream
--n
Number of images
--json
JSON output
Environment Variables
Variable
Description
OPENAI_API_KEY
OpenAI API key
OPENROUTER_API_KEY
OpenRouter API key
GOOGLE_API_KEY
Google API key
DASHSCOPE_API_KEY
DashScope API key (阿里云)
REPLICATE_API_TOKEN
Replicate API token
JIMENG_ACCESS_KEY_ID
Jimeng (即梦) Volcengine access key
JIMENG_SECRET_ACCESS_KEY
Jimeng (即梦) Volcengine secret key
ARK_API_KEY
Seedream (豆包) Volcengine ARK API key
OPENAI_IMAGE_MODEL
OpenAI model override
OPENROUTER_IMAGE_MODEL
OpenRouter model override (default:
google/gemini-3.1-flash-image-preview
)
GOOGLE_IMAGE_MODEL
Google model override
DASHSCOPE_IMAGE_MODEL
DashScope model override (default:
qwen-image-2.0-pro
)
REPLICATE_IMAGE_MODEL
Replicate model override (default: google/nano-banana-pro)
JIMENG_IMAGE_MODEL
Jimeng model override (default: jimeng_t2i_v40)
SEEDREAM_IMAGE_MODEL
Seedream model override (default: doubao-seedream-5-0-260128)
OPENAI_BASE_URL
Custom OpenAI endpoint
OPENROUTER_BASE_URL
Custom OpenRouter endpoint (default:
https://openrouter.ai/api/v1
)
OPENROUTER_HTTP_REFERER
Optional app/site URL for OpenRouter attribution
OPENROUTER_TITLE
Optional app name for OpenRouter attribution
GOOGLE_BASE_URL
Custom Google endpoint
DASHSCOPE_BASE_URL
Custom DashScope endpoint
REPLICATE_BASE_URL
Custom Replicate endpoint
JIMENG_BASE_URL
Custom Jimeng endpoint (default:
https://visual.volcengineapi.com
)
JIMENG_REGION
Jimeng region (default:
cn-north-1
)
SEEDREAM_BASE_URL
Custom Seedream endpoint (default:
https://ark.cn-beijing.volces.com/api/v3
)
BAOYU_IMAGE_GEN_MAX_WORKERS
Override batch worker cap
BAOYU_IMAGE_GEN__CONCURRENCY
Override provider concurrency, e.g.
BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY
BAOYU_IMAGE_GEN__START_INTERVAL_MS
Override provider start gap, e.g.
BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS
Load Priority: CLI args > EXTEND.md > env vars > /.baoyu-skills/.env

~/.baoyu-skills/.env Model Resolution Model priority (highest → lowest), applies to all providers: CLI flag: --model EXTEND.md: default_model.[provider] Env var: _IMAGE_MODEL (e.g., GOOGLE_IMAGE_MODEL ) Built-in default EXTEND.md overrides env vars . If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins. Agent MUST display model info before each generation: Show: Using [provider] / [model] Show switch hint: Switch model: --model | EXTEND.md default_model.[provider] | env _IMAGE_MODEL DashScope Models Use --model qwen-image-2.0-pro or set default_model.dashscope / DASHSCOPE_IMAGE_MODEL when the user wants official Qwen-Image behavior. Official DashScope model families: qwen-image-2.0-pro , qwen-image-2.0-pro-2026-03-03 , qwen-image-2.0 , qwen-image-2.0-2026-03-03 Free-form size in 宽高 format Total pixels must stay between 512512 and 20482048 Default size is approximately 10241024 Best choice for custom ratios such as 21:9 and text-heavy Chinese/English layouts qwen-image-max , qwen-image-max-2025-12-30 , qwen-image-plus , qwen-image-plus-2026-01-09 , qwen-image Fixed sizes only: 1664928 , 14721104 , 13281328 , 11041472 , 9281664 Default size is 1664928 qwen-image currently has the same capability as qwen-image-plus Legacy DashScope models such as z-image-turbo , z-image-ultra , wanx-v1 Keep using them only when the user explicitly asks for legacy behavior or compatibility When translating CLI args into DashScope behavior: --size wins over --ar For qwen-image-2.0 , prefer explicit --size ; otherwise infer from --ar and use the official recommended resolutions below For qwen-image-max/plus/image , only use the five official fixed sizes; if the requested ratio is not covered, switch to qwen-image-2.0-pro --quality is a baoyu-image-gen compatibility preset, not a native DashScope API field. Mapping normal / 2k onto the qwen-image-2.0 table below is an implementation inference, not an official API guarantee Recommended qwen-image-2.0 sizes for common aspect ratios: Ratio normal 2k 1:1 10241024 15361536 2:3 7681152 10241536 3:2 1152768 15361024 3:4 9601280 10801440 4:3 1280960 14401080 9:16 7201280 10801920 16:9 1280720 19201080 21:9 1344576 2048*872 DashScope official APIs also expose negative_prompt , prompt_extend , and watermark , but baoyu-image-gen does not expose them as dedicated CLI flags today. Official references: Qwen-Image API Text-to-image guide Qwen-Image Edit API OpenRouter Models Use full OpenRouter model IDs, e.g.: google/gemini-3.1-flash-image-preview (recommended, supports image output and reference-image workflows) google/gemini-2.5-flash-image-preview black-forest-labs/flux.2-pro Other OpenRouter image-capable model IDs Notes: OpenRouter image generation uses /chat/completions , not the OpenAI /images endpoints If --ref is used, choose a multimodal model that supports image input and image output --imageSize maps to OpenRouter imageGenerationOptions.size ; --size is converted to the nearest OpenRouter size and inferred aspect ratio when possible Replicate Models Supported model formats: owner/name (recommended for official models), e.g. google/nano-banana-pro owner/name:version (community models by version), e.g. stability-ai/sdxl: Examples:

Use Replicate default model

${BUN_X} { baseDir } /scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Override model explicitly

${BUN_X}

{

baseDir

}

/scripts/main.ts

--prompt

"A cat"

--image

out.png

--provider

replicate

--model

google/nano-banana

Provider Selection

--ref

provided + no

--provider

→ auto-select Google first, then OpenAI, then OpenRouter, then Replicate (Jimeng and Seedream do not support reference images)

--provider

specified → use it (if

--ref

, must be

google

,

openai

,

openrouter

, or

replicate

)

Only one API key available → use that provider

Multiple available → default to Google

Quality Presets

Preset

Google imageSize

OpenAI Size

OpenRouter size

Replicate resolution

Use Case

normal

1K

1024px

1K

Quick previews

2k

(default)

2K

2048px

2K

Covers, illustrations, infographics

Google/OpenRouter imageSize

Can be overridden with

--imageSize 1K|2K|4K

Aspect Ratios

Supported:

1:1

,

16:9

,

9:16

,

4:3

,

3:4

,

2.35:1

Google multimodal: uses

imageConfig.aspectRatio

OpenAI: maps to closest supported size

OpenRouter: sends

imageGenerationOptions.aspect_ratio

; if only

--size

is given, aspect ratio is inferred automatically

Replicate: passes

aspect_ratio

to model; when

--ref

is provided without

--ar

, defaults to

match_input_image

Generation Mode

Default

Sequential generation.
Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation. Mode When to Use Sequential (default) Normal usage, single images, small batches Parallel batch Batch mode with 2+ tasks Execution choice: Situation Preferred approach Why One image, or 1-2 simple images Sequential Lower coordination overhead and easier debugging Multiple images already have saved prompt files Batch ( --batchfile ) Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput Each image still needs separate reasoning, prompt writing, or style exploration Subagents The work is still exploratory, so each image may need independent analysis before generation Output comes from baoyu-article-illustrator with outline.md + prompts/ Batch ( build-batch.ts -> --batchfile ) That workflow already produces prompt files, so direct batch execution is the intended path Rule of thumb: Prefer batch over subagents once prompt files are already saved and the task is "generate all of these" Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration Parallel behavior: Default worker count is automatic, capped by config, built-in default 10 Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts You can override worker count with --jobs Each image retries automatically up to 3 attempts Final output includes success count, failure count, and per-image failure reasons Error Handling Missing API key → error with setup instructions Generation failure → auto-retry up to 3 attempts per image Invalid aspect ratio → warning, proceed with default Reference images with unsupported provider/model → error with fix hint Extension Support Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

安装

macOS, Linux, WSL, Git Bash

PowerShell (Windows)

Basic

With aspect ratio

High quality

From prompt files

With reference images (Google, OpenAI, OpenRouter, or Replicate)

With reference images (explicit provider/model)

OpenRouter (recommended default model)

OpenRouter with reference images

Specific provider

DashScope (阿里通义万象)

DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)

DashScope legacy Qwen fixed-size model

Replicate (google/nano-banana-pro)

Replicate with specific model

Batch mode with saved prompt files

Batch mode with explicit worker count

Use Replicate default model

Override model explicitly