- When to Use
- User wants a full content package for a specific platform (WeChat article, Xiaohongshu post, narration script)
- User says "帮我写篇公众号", "小红书图文", "口播稿", "create content"
- User provides a URL/text/topic and wants it turned into platform-ready content with images
- When NOT to Use
- User wants a single image without a content workflow → use image-gen directly
- User wants a single TTS audio → use tts directly
- User wants to transcribe audio → use asr directly
- User wants a podcast episode → use podcast directly
- User wants to extract content from a URL without further processing → use content-parser directly
- Creator is for
- multi-step content production
- that combines writing + media generation into a platform-ready package.
- Purpose
- Generate platform-specific content packages by orchestrating existing skills. Input: topic, URL, text, or audio/video file. Output: a folder with article/script, images, and metadata — ready to publish.
- Hard Constraints
- Use
- listenhub
- CLI commands for image-gen and TTS. Use curl for content-parser (see
- content-parser/SKILL.md
- § API Reference).
- Always read config following
- shared/config-pattern.md
- before any interaction
- Follow
- shared/cli-patterns.md
- for polling, errors, and interaction patterns
- Never save files to
- ~/Downloads/
- or
- .listenhub/
- — save content packages to the current working directory
- JSON parsing: use
- jq
- only (no python3, awk)
- Step -1: API Key Check
- Deferred. API key is checked at the confirmation gate (Step 4) only when the pipeline requires remote API calls. See Hard Constraints above.
- Step 0: Config Setup
- Follow
- shared/config-pattern.md
- Step 0 (Zero-Question Boot).
- If file doesn't exist
- — silently create with defaults and proceed:
- mkdir
- -p
- ".listenhub/creator"
- ".listenhub/creator/styles"
- cat
- >
- ".listenhub/creator/config.json"
- <<
- 'EOF'
- {"outputMode":"download","language":null,"preferences":{"wechat":{"history":[]},"xiaohongshu":{"mode":"both","history":[]},"narration":{"defaultSpeaker":null,"history":[]}}}
- EOF
- CONFIG_PATH
- =
- ".listenhub/creator/config.json"
- CONFIG
- =
- $(
- cat
- "
- $CONFIG_PATH
- "
- )
- User style preferences are stored as markdown files in
- .listenhub/creator/styles/
- :
- .listenhub/creator/styles/wechat.md
- .listenhub/creator/styles/xiaohongshu.md
- .listenhub/creator/styles/narration.md
- These files are plain markdown — one directive per line. If the file does not exist, no custom style is applied. Users can edit these files directly.
- Note:
- outputMode
- defaults to
- "download"
- (not the usual
- "inline"
- ) because creator always produces multi-file output folders that must be saved to disk.
- If file exists
- — read config silently and proceed:
- CONFIG_PATH
- =
- ".listenhub/creator/config.json"
- [
- !
- -f
- "
- $CONFIG_PATH
- "
- ]
- &&
- CONFIG_PATH
- =
- "
- $HOME
- /.listenhub/creator/config.json"
- CONFIG
- =
- $(
- cat
- "
- $CONFIG_PATH
- "
- )
- Setup Flow (user-initiated reconfigure only)
- Only when user explicitly asks to reconfigure. Display current settings:
- 当前配置 (creator):
- 输出方式:{outputMode}
- 小红书模式:{both / cards / long-text}
- Ask:
- outputMode
-
- Follow
- shared/output-mode.md
- § Setup Flow Question.
- xiaohongshu.mode
-
- "小红书默认模式?"
- "图文 + 长文(both)"
- "仅图文卡片(cards)"
- "仅长文(long-text)"
- Interaction Flow
- Step 1: Understand Input
- The user provides input along with their request. Classify the input:
- Input Type
- Detection
- Auto Action
- URL (web/article)
- http(s)://
- prefix, not an audio/video URL
- Will call content-parser (requires API key)
- URL (audio/video)
- Extension
- .mp3/.mp4/.wav/.m4a/.webm
- or domain is youtube.com/bilibili.com/douyin.com
- Will download + call
- coli asr
- to transcribe
- Local audio file
- File path exists, extension is audio/video
- Will call
- coli asr
- directly
- Local text file
- File path exists, extension is
- .txt/.md/.json
- Read file content
- Raw text
- Multi-line or >50 chars, not a URL/path
- Use directly as material
- Topic/keywords
- Short text (<50 chars), no URL/path pattern
- AI writes from scratch
- Style reference detection:
- If the user's prompt contains keywords like "参考", "风格", "照着…写", "style", "reference", the associated input (file path / URL / pasted text) should be classified as a
- style reference
- rather than content material. A single request may contain both material and a style reference — classify them separately. If only a style reference is provided with no material or topic, this is a
- standalone style learning
- request (see Step 2.5).
- For URL (audio/video) inputs:
- Download to
- /tmp/creator-{slug}.{ext}
- using
- curl -L -o
- Check
- coli
- is available:
- which coli 2>/dev/null && echo yes || echo no
- If
- coli
- missing: inform user to install (
- npm install -g @marswave/coli
- ), ask them to paste text instead
- Transcribe:
- coli asr -j --model sensevoice "/tmp/creator-{slug}.{ext}"
- Extract text from JSON result
- Cleanup:
- rm "/tmp/creator-{slug}.{ext}"
- For URL (web/article) inputs:
- Content-parser will be called during pipeline execution (after confirmation).
- Step 2: Template Matching
- If the user specified a platform in their prompt, match directly:
- "公众号", "wechat", "微信" → wechat
- "小红书", "xiaohongshu", "xhs" → xiaohongshu
- "口播", "narration", "脚本" → narration
- If no platform was specified, ask via AskUserQuestion:
- Question: "Which content template?" / "用哪个创作模板?"
- Options (adapt language to user's input):
- "WeChat article (公众号长文)" — Long-form article with AI illustrations
- "Xiaohongshu (小红书)" — Image cards + long text post
- "Narration script (口播稿)" — Spoken script with optional audio
- Step 2.5: Topic Assistance
- This step runs only when the user's input is a topic or keywords (short text <50 chars, no URL/path). Skip if user provided a URL, file, or substantial text.
- Read the selected platform's
- methodology.md
- :
- WeChat:
- creator/templates/wechat/methodology.md
- Xiaohongshu:
- creator/templates/xiaohongshu/methodology.md
- Narration:
- creator/templates/narration/methodology.md
- Evaluate the topic using the three-circle Venn model:
- 用户的专业领域 (creator's expertise)
- 读者的普遍兴趣 (reader interest)
- 当下的时间节点 (current timing/relevance)
- Run HKR quality filter:
- H (Happy)
-
- 足够有趣、有悬念?
- K (Knowledge)
-
- 有信息量?看完能学到新东西?
- R (Resonance)
-
- 能戳中情绪?让人"对对对我也这么想"?
- If topic scores ≥2 of 3 HKR criteria: proceed with the topic.
- If topic scores <2: proactively suggest 2-3 alternative angles to the user via AskUserQuestion.
- If topic is vague: ask for more specifics — key points, personal experiences, what excites or frustrates them.
- Step 3: Style Extraction (if style reference provided)
- This step runs only when the user provided a style reference in Step 1. If no style reference was detected, skip to Step 3b.
- Read the reference content:
- Local file → Read tool
- URL → content-parser API (requires API key)
- Pasted text → use directly
- Analyze and extract style directives:
- AI reads the reference content and extracts 3-5 concrete style directives. Focus on observable patterns:
- Sentence length and paragraph structure
- Tone and register (formal/casual, first/third person)
- Use of rhetorical devices (questions, lists, bold, quotes)
- Vocabulary level and domain jargon
- Formatting habits (heading style, emoji usage, whitespace)
- Present to user for confirmation:
- 从参考文章中提炼了以下风格特征:
- 1.
- 2.
- 3.
- ...
- 你可以修改或删除其中的条目。确认后本次生成会应用这些规则。
- Wait for user confirmation. The confirmed directives become
- sessionStyle
- — applied to this generation only.
- After user confirms the style directives, proactively ask whether to persist:
- 要将这些风格规则保存吗?(保存后每次生成{platform}内容都会应用)
- If yes → write to
- .listenhub/creator/styles/{platform}.md
- . If no → only apply to this generation.
- Standalone style learning:
- If the user only provided a style reference without material/topic (e.g., "学习一下这篇文章的风格"), run the extraction above, then
- persist directly
- to
- .listenhub/creator/styles/{platform}.md
- without asking — the user's intent to save is already explicit. Confirm with a brief message: "已保存到 styles/{platform}.md". Do not proceed to content generation.
- Step 3a: Prototype Classification
- Read the selected platform's prototype file:
- WeChat:
- creator/templates/wechat/article-prototypes.md
- Xiaohongshu:
- creator/templates/xiaohongshu/content-prototypes.md
- Narration:
- creator/templates/narration/script-prototypes.md
- Based on the user's material/topic, auto-match the best-fit prototype using the matching heuristics table in the prototype file.
- Present the recommendation to the user via AskUserQuestion:
- Question: "这篇内容最适合哪种写法?" / "Which content prototype fits best?"
- Options: [list all prototypes for the platform, recommended one first with "(Recommended)" suffix]
- The selected prototype determines the narrative structure and L3-5 review criteria for writing.
- Step 3b: Preset Selection (if applicable)
- If the selected template uses illustration or card presets
- and
- the mode requires images, the preset MUST be chosen
- before
- the confirmation gate so it can be displayed in the summary.
- Skip this step entirely
- for:
- Narration template (no visual presets)
- Xiaohongshu with
- preferences.xiaohongshu.mode
- =
- "long-text"
- (no cards or images generated)
- Otherwise:
- Read the template's preset section to get available presets and the topic-matching table.
- If the user already specified a preset
- in their prompt (e.g., "用水彩风格"): use that preset directly.
- If not specified
- ask the user via AskUserQuestion. Output a one-line hint first: "配图风格可以随时换,先选一个开始吧". List all available presets with their Chinese labels (from frontmatter label field). Use the topic-matching table to put the most relevant option first (marked "Recommended"), but always let the user choose. Step 4: Confirmation Gate Check API key if the pipeline needs remote APIs: WeChat template always needs image-gen → requires API key Xiaohongshu cards mode needs image-gen → requires API key Xiaohongshu long-text only → no API key needed Narration without TTS → no API key needed Web/article URL input → needs content-parser → requires API key (audio/video URLs use local coli asr , no API key needed) If API key required and missing: for CLI-based calls, run listenhub auth login . For content-parser calls, configure LISTENHUB_API_KEY (see content-parser/SKILL.md § Authentication). Show confirmation summary: 准备生成内容: 模板:{WeChat article / Xiaohongshu / Narration} 输入:{topic description / URL / text excerpt...} 输出目录:{slug}-{platform}/ 需要 API 调用:{content-parser, image-gen, ...} 风格偏好:{styles/{platform}.md 已配置 / 使用默认风格} 配图/卡片预设:{preset label / 不适用} 文章/内容原型:{selected prototype name} 本次风格参考:{M条来自参考文章 / 无} 确认开始? Wait for explicit "yes" / confirmation before proceeding. Step 5: Execute Pipeline Read the selected template file and execute:
The template file path
TEMPLATE
"creator/templates/ $PLATFORM /template.md" STYLE = "creator/templates/ $PLATFORM /style.md" For URL inputs — extract content first:
Submit content extraction
RESPONSE
$( curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \ -H "Authorization: Bearer $LISTENHUB_API_KEY " \ -H "Content-Type: application/json" \ -H "X-Source: skills" \ -d "{ \" source \" :{ \" type \" : \" url \" , \" uri \" : \" $INPUT_URL \" }}" ) TASK_ID = $( echo " $RESPONSE " | jq -r '.data.taskId' ) Then poll in background. Run this as a separate Bash call with run_in_background: true and timeout: 600000 (per shared/cli-patterns.md ). The polling loop itself runs up to 300s (60 polls × 5s); timeout: 600000 is set higher at the tool level to give the Bash process headroom beyond the poll budget:
Run with: run_in_background: true, timeout: 600000
TASK_ID
"
/dev/null ) STATUS = $( echo " $RESULT " | tr -d '\000-\037\177' | jq -r '.data.status // "processing"' ) case " $STATUS " in completed ) echo " $RESULT " ; exit 0 ; ; failed ) echo "FAILED: $RESULT "
&2 ; exit 1 ; ; * ) sleep 5 ; ; esac done echo "TIMEOUT"
&2 ; exit 2 Extract content: MATERIAL=$(echo "$RESULT" | jq -r '.data.data.content') If extraction fails: tell user "URL 解析失败,你可以直接粘贴文字内容给我" and stop. Then follow the platform template — read template.md and execute each step. The template specifies the exact writing instructions and API calls. See creator/templates/{platform}/template.md for template contents. Writing engine integration: Each platform's template.md now includes writing-engine references and a self-review loop. The template handles loading writing-engine/ files, applying the selected prototype's narrative structure, and running L1-L4 quality review after writing. See each platform's template.md for details. Style application: When writing content, apply style directives in this priority order (higher overrides lower): sessionStyle — directives from the current style reference (Step 3), if any .listenhub/creator/styles/{platform}.md — persisted user style directives (if file exists) templates/{platform}/style.md — baseline platform style For image generation (called by wechat and xiaohongshu templates): RESPONSE = $( listenhub image create \ --prompt "
" \ --aspect-ratio " " \ --json ) BASE64_DATA = $( echo " $RESPONSE " | jq -r '.candidates[0].content.parts[0].inlineData.data // .data' )
macOS uses -D, Linux uses -d (detect platform)
if [ [ " $( uname ) " == "Darwin" ] ] ; then echo " $BASE64_DATA " | base64 -D
"{output-path}/{filename}.jpg" else echo " $BASE64_DATA " | base64 -d
"{output-path}/{filename}.jpg" fi On 429: exponential backoff (wait 15s → 30s → 60s), retry up to 3 times. On failure after retries: skip this image, annotate in output summary. Generate images sequentially (not parallel) to respect rate limits. For TTS (called by narration template when user wants audio): listenhub tts create --text " $( cat /tmp/lh-content.txt ) " --speaker " $SPEAKER_ID " --json \ | jq -r '.data' | base64 -D
"{slug}-narration/audio.mp3" Step 6: Assemble Output Create the output folder and write all files: SLUG = "{topic-slug}" OUTPUT_DIR = " ${SLUG} -{platform}"
Dedup folder name
i
- 2
- ;
- while
- [
- -d
- "
- $OUTPUT_DIR
- "
- ]
- ;
- do
- OUTPUT_DIR
- =
- "
- ${SLUG}
- -{platform}-
- ${i}
- "
- ;
- i
- =
- $((
- i
- +
- 1
- ))
- ;
- done
- mkdir
- -p
- "
- $OUTPUT_DIR
- "
- Write content files per template spec. Then write
- meta.json
- :
- {
- "title"
- :
- "..."
- ,
- "slug"
- :
- "..."
- ,
- "platform"
- :
- "wechat|xiaohongshu|narration"
- ,
- "date"
- :
- "YYYY-MM-DD"
- ,
- "tags"
- :
- [
- "..."
- ,
- "..."
- ]
- ,
- "summary"
- :
- "..."
- }
- Step 7: Present Result
- ✅ 内容已生成!保存在 {OUTPUT_DIR}/
- 📄
- 🖼️ images/ — N 张配图(如有)
- 📋 meta.json — 标题、标签、摘要
- (Adapt language to user's input language per Hard Constraints.)
- Step 8: Update Preferences
- Record this generation in history:
- NEW_CONFIG
- =
- $(
- echo
- "
- $CONFIG
- "
- |
- jq
- \
- --arg
- platform
- "
- $PLATFORM
- "
- \
- --arg
- date
- "
- $(
- date
- +%Y-%m-%d
- )
- "
- \
- --arg
- topic
- "
- $TOPIC
- "
- \
- '.preferences[$platform].history = (.preferences[$platform].history + [{"date": $date, "topic": $topic}])[-5:]'
- )
- echo
- "
- $NEW_CONFIG
- "
- >
- "
- $CONFIG_PATH
- "
- Keep only the last 5 history entries per platform.
- Note:
- cardStyle
- from the spec is deferred — not implemented in V1 config. Can be added later when card style customization is needed.
- Manual Style Tuning
- Adding style directives:
- If the user says "记住:{style directive}" or "remember: {style directive}":
- Detect which platform it applies to (from context or ask)
- Append the directive as a new line to
- .listenhub/creator/styles/{platform}.md
- (create the file if it doesn't exist)
- This also applies after Step 3 (Style Extraction): if the user says "记住这个风格" after reviewing extracted directives, write all confirmed directives to
- .listenhub/creator/styles/{platform}.md
- .
- Resetting style:
- If the user says "重置风格偏好" or "reset style":
- Ask which platform (or all)
- Delete
- .listenhub/creator/styles/{platform}.md
- API Reference
- Authentication:
- shared/cli-authentication.md
- Image generation: CLI:
- listenhub image create
- (see
- shared/cli-patterns.md
- )
- Content extraction:
- content-parser/SKILL.md
- § API Reference (Inlined)
- TTS (text-to-speech): CLI:
- listenhub tts create
- (see
- shared/cli-patterns.md
- )
- Speaker selection:
- shared/speaker-selection.md
- Config pattern:
- shared/config-pattern.md
- Common patterns (polling, errors):
- shared/cli-patterns.md
- Output mode:
- shared/output-mode.md
- Composability
- Invokes
-
- content-parser (URL extraction), image-gen (illustrations/cards), tts (narration audio), asr (audio/video transcription via
- coli
- )
- Invoked by
- standalone — user triggers directly Templates : creator/templates/{wechat,xiaohongshu,narration}/template.md define per-platform pipelines Style guides : creator/templates/{wechat,xiaohongshu,narration}/style.md define per-platform writing tone