When to Use User wants to create an explainer or tutorial video User asks to "explain" something in video form User wants narrated content with AI-generated visuals User says "explainer video", "解说视频", "tutorial video" When NOT to Use User wants audio-only content without visuals (use /speech or /podcast ) User wants a podcast-style discussion (use /podcast ) User wants to generate a standalone image (use /image-gen ) User wants to read text aloud without video (use /speech ) Purpose Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output. Hard Constraints No shell scripts. Construct curl commands from the API reference files listed in Resources Always read shared/authentication.md for API key and headers Follow shared/common-patterns.md for polling, errors, and interaction patterns Always read config following shared/config-pattern.md before any interaction Never hardcode speaker IDs — always fetch from the speakers API Never save files to ~/Downloads/ — use .listenhub/explainer/ from config Explainer uses exactly 1 speaker Mode must be info (for Info style) or story (for Story style) — never slides (use /slides skill instead) Step -1: API Key Check Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately. Step 0: Config Setup Follow shared/config-pattern.md Step 0. If file doesn't exist — ask location, then create immediately: mkdir -p ".listenhub/explainer" echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}'

".listenhub/explainer/config.json" CONFIG_PATH = ".listenhub/explainer/config.json"

(or $HOME/.listenhub/explainer/config.json for global)

Then run
Setup Flow
below.
If file exists
— read config, display summary, and confirm:
当前配置 (explainer)：
输出方式：{inline / download / both}
语言偏好：{zh / en / 未设置}
默认风格：{info / story / 未设置}
默认主播：{speakerName / 未设置}
Ask: "使用已保存的配置？" →
确认，直接继续
/
重新配置
Setup Flow (first run or reconfigure)
Ask these questions in order, then save all answers to config at once:
outputMode: Follow shared/output-mode.md § Setup Flow Question. Language (optional): "默认语言？" "中文 (zh)" "English (en)" "每次手动选择" → keep null Style (optional): "默认风格？" "Info — 信息展示型" "Story — 故事叙述型" "每次手动选择" → keep null After collecting answers, save immediately:

Follow shared/output-mode.md § Save to Config

NEW_CONFIG

$(

echo

"

$CONFIG

"

|

jq

--arg

m

"

$OUTPUT_MODE

"

'. + {"outputMode": $m}'

)

echo

"

$NEW_CONFIG

"

>

"

$CONFIG_PATH

"

CONFIG

=

$(

cat

"

$CONFIG_PATH

"

)

Note:

defaultSpeakers

are saved after generation (see After Successful Generation section).

Interaction Flow

Step 1: Topic / Content

Free text input. Ask the user:

What would you like to explain or introduce?

Accept: topic description, text content, or concept to explain.

Step 2: Language

If

config.language

is set, pre-fill and show in summary — skip this question.

Otherwise ask:

Question: "What language?"

Options:

- "Chinese (zh)" — Content in Mandarin Chinese

- "English (en)" — Content in English

Step 3: Style

If

config.defaultStyle

is set, pre-fill and show in summary — skip this question.

Otherwise ask:

Question: "What style of explainer?"

Options:

- "Info" — Informational, factual presentation style

- "Story" — Narrative, storytelling approach

Step 4: Speaker Selection

Follow

shared/speaker-selection.md

for the full selection flow, including:

Default from

config.defaultSpeakers.{language}

(skip step if set)

Text table + free-text input

Input matching and re-prompt on no match

Only 1 speaker is supported for explainer videos.

Step 5: Output Type

Question: "What output do you want?"

Options:

- "Text script only" — Generate narration script, no video

- "Text + Video" — Generate full explainer video with AI visuals

Step 6: Confirm & Generate

Summarize all choices:

Ready to generate explainer:

Topic:

Language:

Style:

Speaker:

Output:

Proceed?

Wait for explicit confirmation before calling any API.

Workflow

Submit (foreground)

:

POST /storybook/episodes

with content, speaker, language, mode → extract

episodeId

Tell the user the task is submitted

Poll (background)

Run the following

exact

bash command with

run_in_background: true

and

timeout: 600000

. Do NOT use python3, awk, or any other JSON parser — use

jq

as shown:

EPISODE_ID

=

""

for

i

in

$(

seq

1

30

)

;

do

RESULT

=

$(

curl

-sS

"https://api.marswave.ai/openapi/v1/storybook/episodes/

$EPISODE_ID

"

\

-H

"Authorization: Bearer

$LISTENHUB_API_KEY

"

2

>

/dev/null

)

STATUS

=

$(

echo

"

$RESULT

"

|

tr

-d

'\000-\037\177'

|

jq

-r

'.data.processStatus // "pending"'

)

case

"

$STATUS

"

in

success

|

completed

)

echo

"

$RESULT

"

;

exit

0

;

failed

|

error

)

echo

"FAILED:

$RESULT

"

>

&2

;

exit

1

;

*

)

sleep

10

;

esac

done

echo

"TIMEOUT"

>

&2

;

exit

2

When notified,

download and present script

:

Read

OUTPUT_MODE

from config. Follow

shared/output-mode.md

for behavior.

inline

or

both

Present the script inline.

Present:

解说脚本已生成！

「{title}」

在线查看：https://listenhub.ai/app/explainer/{episodeId}

download

or

both

Also save the script file.

Create

.listenhub/explainer/YYYY-MM-DD-{episodeId}/

Write

{episodeId}.md

from the generated script content

Present the download path in addition to the above summary.

If video requested

:

POST /storybook/episodes/{episodeId}/video

(foreground) →

poll again (background)

using the

exact

bash command below with

run_in_background: true

and

timeout: 600000

. Poll for

videoStatus

, not

processStatus

:

EPISODE_ID

=

""

for

i

in

$(

seq

1

30

)

;

do

RESULT

=

$(

curl

-sS

"https://api.marswave.ai/openapi/v1/storybook/episodes/

$EPISODE_ID

"

\

-H

"Authorization: Bearer

$LISTENHUB_API_KEY

"

2

>

/dev/null

)

STATUS

=

$(

echo

"

$RESULT

"

|

tr

-d

'\000-\037\177'

|

jq

-r

'.data.videoStatus // "pending"'

)

case

"

$STATUS

"

in

success

|

completed

)

echo

"

$RESULT

"

;

exit

0

;

failed

|

error

)

echo

"FAILED:

$RESULT

"

>

&2

;

exit

1

;

*

)

sleep

10

;

esac

done

echo

"TIMEOUT"

>

&2

;

exit

2

When notified,

download and present result

:

Present result

Read

OUTPUT_MODE

from config. Follow

shared/output-mode.md

for behavior.

inline

or

both

Display video URL and audio URL as clickable links.

Present:

解说视频已生成！

视频链接：{videoUrl}

音频链接：{audioUrl}

时长：{duration}s

消耗积分：{credits}

download

or

both

Also download the audio file.

DATE

=

$(

date

+%Y-%m-%d

)

JOB_DIR

=

".listenhub/explainer/

${DATE}

-{jobId}"

mkdir

-p

"

$JOB_DIR

"

curl

-sS

-o

"

${JOB_DIR}

/{jobId}.mp3"

"{audioUrl}"

Present the download path in addition to the above summary.

After Successful Generation

Update config with the choices made this session:

NEW_CONFIG

=

$(

echo

"

$CONFIG

"

|

jq

\

--arg

lang

"{language}"

\

--arg

style

"{info/story}"

\

--arg

speakerId

"{speakerId}"

\

'. +

{

"language"

:

$lang,

"defaultStyle"

:

$style,

"defaultSpeakers"

:

(

.defaultSpeakers +

{

(

$lang

)

:

[

$speakerId

]

}

)

}

'

)

echo

"

$NEW_CONFIG

"

>

"

$CONFIG_PATH

"

Estimated times

:

Text script only: 2-3 minutes

Text + Video: 3-5 minutes

API Reference

Speaker list:

shared/api-speakers.md

Speaker selection guide:

shared/speaker-selection.md

Episode creation:

shared/api-storybook.md

Polling:

shared/common-patterns.md

§ Async Polling

Config pattern:

shared/config-pattern.md

Composability

Invokes

speakers API (for speaker selection); may invoke

/speech

for voiceover

Invoked by

content-planner (Phase 3)
Example
User: "Create an explainer video introducing Claude Code" Agent workflow : Topic: "Claude Code introduction" Ask language → "English" Ask style → "Info" Fetch speakers, user picks "cozy-man-english" Ask output → "Text + Video" curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \ -H "Authorization: Bearer $LISTENHUB_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "sources": [{"type": "text", "content": "Introduce Claude Code: what it is, key features, and how to get started"}], "speakers": [{"speakerId": "cozy-man-english"}], "language": "en", "mode": "info" }' Poll until text is ready, then generate video if requested.

安装

(or $HOME/.listenhub/explainer/config.json for global)

Follow shared/output-mode.md § Save to Config

NEW_CONFIG