MiniMax CLI — Agent Skill Guide Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform. Prerequisites

Install

npm install -g mmx-cli

Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json)

mmx auth login --api-key sk-xxxxx

Verify active auth source

mmx auth status

Or pass per-call

mmx text chat --api-key sk-xxxxx --message "Hello" Region is auto-detected. Override with --region global or --region cn . Agent Flags Always use these flags in non-interactive (agent/CI) contexts: Flag Purpose --non-interactive Fail fast on missing args instead of prompting --quiet Suppress spinners/progress; stdout is pure data --output json Machine-readable JSON output --async Return task ID immediately (video generation) --dry-run Preview the API request without executing --yes Skip confirmation prompts Commands text chat Chat completion. Default model: MiniMax-M2.7 . mmx text chat --message < text

[ flags ] Flag Type Description --message string, required , repeatable Message text. Prefix with role: to set role (e.g. "system:You are helpful" , "user:Hello" ) --messages-file string JSON file with messages array. Use - for stdin --system string System prompt --model string Model ID (default: MiniMax-M2.7 ) --max-tokens number Max tokens (default: 4096) --temperature number Sampling temperature (0.0, 1.0] --top-p number Nucleus sampling threshold --stream boolean Stream tokens (default: on in TTY) --tool string, repeatable Tool definition JSON or file path

Single message

mmx text chat --message "user:What is MiniMax?" --output json --quiet

Multi-turn

mmx text chat \ --system "You are a coding assistant." \ --message "user:Write fizzbuzz in Python" \ --output json

From file

cat
conversation.json
|
mmx text chat --messages-file -
--output
json
stdout: response text (text mode) or full response object (json mode). image generate Generate images. Model: image-01 . mmx image generate --prompt < text

[ flags ] Flag Type Description --prompt string, required Image description --aspect-ratio string e.g. 16:9 , 1:1 --n number Number of images (default: 1) --subject-ref string Subject reference: type=character,image=path-or-url --out-dir
string Download images to directory --out-prefix string Filename prefix (default: image ) mmx image generate --prompt "A cat in a spacesuit" --output json --quiet

stdout: image URLs (one per line in quiet mode)

mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet

stdout: saved file paths (one per line)

video generate Generate video. Default model: MiniMax-Hailuo-2.3 . This is an async task — by default it polls until completion. mmx video generate --prompt < text

[ flags ] Flag Type Description --prompt string, required Video description --model string MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast --first-frame string First frame image --callback-url string Webhook URL for completion --download string Save video to specific file --async boolean Return task ID immediately --no-wait boolean Same as --async --poll-interval number Polling interval (default: 5)

Non-blocking: get task ID

mmx video generate --prompt "A robot." --async --quiet

stdout:

Blocking: wait and get file path

mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet

stdout: ocean.mp4

video task get Query status of a video generation task. mmx video task get --task-id < id

[ --output json ] video download Download a completed video by task ID. mmx video download --file-id < id

[ --out < path

] speech synthesize Text-to-speech. Default model: speech-2.8-hd . Max 10k chars. mmx speech synthesize --text < text

[ flags ] Flag Type Description --text string Text to synthesize --text-file string Read text from file. Use - for stdin --model string speech-2.8-hd (default), speech-2.6 , speech-02 --voice string Voice ID (default: English_expressive_narrator ) --speed number Speed multiplier --volume number Volume level --pitch number Pitch adjustment --format string Audio format (default: mp3 ) --sample-rate number Sample rate (default: 32000) --bitrate number Bitrate (default: 128000) --channels number Audio channels (default: 1) --language string Language boost --subtitles boolean Download and save subtitles as .srt file (alongside --out audio file). API must support subtitles for the selected model. --pronunciation string, repeatable Custom pronunciation --sound-effect string Add sound effect --out string Save audio to file --stream boolean Stream raw audio to stdout mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet


stdout: hello.mp3
mmx speech synthesize
--text
"Hello"
--subtitles
--out
hello.mp3
saves hello.mp3 + hello.srt (SRT subtitle file)
echo
"Breaking news."
|
mmx speech synthesize --text-file -
--out
news.mp3
music generate
Generate music. Responds well to rich, structured descriptions.
Model:
music-2.6-free
— unlimited for API key users, RPM = 3.
mmx music generate
--prompt
<
text

[
--lyrics
<
text
]
[
flags
]
Flag
Type
Description
--prompt 
string
Music style description (can be detailed)
--lyrics 
string
Song lyrics with structure tags. Required unless
--instrumental
or
--lyrics-optimizer
is used.
--lyrics-file 
string
Read lyrics from file. Use
-
for stdin
--lyrics-optimizer
boolean
Auto-generate lyrics from prompt. Cannot be used with
--lyrics
or
--instrumental
.
--instrumental
boolean
Generate instrumental music (no vocals). Cannot be used with
--lyrics
.
--vocals 
string
Vocal style, e.g.
"warm male baritone"
,
"bright female soprano"
,
"duet with harmonies"
--genre 
string
Music genre, e.g. folk, pop, jazz
--mood 
string
Mood or emotion, e.g. warm, melancholic, uplifting
--instruments 
string
Instruments to feature, e.g.
"acoustic guitar, piano"
--tempo 
string
Tempo description, e.g. fast, slow, moderate
--bpm 
number
Exact tempo in beats per minute
--key 
string
Musical key, e.g. C major, A minor, G sharp
--avoid 
string
Elements to avoid in the generated music
--use-case 
string
Use case context, e.g.
"background music for video"
,
"theme song"
--structure 
string
Song structure, e.g.
"verse-chorus-verse-bridge-chorus"
--references 
string
Reference tracks or artists, e.g.
"similar to Ed Sheeran"
--extra 
string
Additional fine-grained requirements
--aigc-watermark
boolean
Embed AI-generated content watermark
--format 
string
Audio format (default:
mp3
)
--sample-rate 
number
Sample rate (default: 44100)
--bitrate 
number
Bitrate (default: 256000)
--out 
string
Save audio to file
--stream
boolean
Stream raw audio to stdout
At least one of
--prompt
or
--lyrics
is required.

With lyrics
mmx music generate
--prompt
"Upbeat pop"
--lyrics
"La la la..."
--out
song.mp3
--quiet
Auto-generate lyrics from prompt
mmx music generate
--prompt
"Upbeat pop about summer"
--lyrics-optimizer
--out
summer.mp3
--quiet
Instrumental
mmx music generate
--prompt
"Cinematic orchestral, building tension"
--instrumental
--out
bgm.mp3
--quiet
Detailed prompt with vocal characteristics
mmx music generate
--prompt
"Warm morning folk"
\
--vocals
"male and female duet, harmonies in chorus"
\
--instruments
"acoustic guitar, piano"
\
--bpm
95
\
--lyrics-file song.txt
\
--out
duet.mp3
music cover
Generate a cover version of a song based on reference audio.
Model:
music-cover-free
— unlimited for API key users, RPM = 3.
mmx music cover
--prompt
<
text

(
--audio
<
url
|
--audio-file
<
path
)
[
flags
]
Flag
Type
Description
--prompt 
string,
required
Target cover style, e.g.
"Indie folk, acoustic guitar, warm male vocal"
--audio 
string
URL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB)
--audio-file 
string
Local reference audio file (auto base64-encoded)
--lyrics 
string
Cover lyrics. If omitted, extracted from reference audio via ASR.
--lyrics-file 
string
Read lyrics from file. Use
-
for stdin
--seed 
number
Random seed 0–1000000 for reproducible results
--format 
string
Audio format:
mp3
,
wav
,
pcm
(default:
mp3
)
--sample-rate 
number
Sample rate (default: 44100)
--bitrate 
number
Bitrate (default: 256000)
--channel 
number
Channels:
1
(mono) or
2
(stereo, default)
--out 
string
Save audio to file
--stream
boolean
Stream raw audio to stdout

Cover from URL
mmx music cover
--prompt
"Indie folk, acoustic guitar, warm male vocal"
\
--audio
https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3
--out
cover.mp3
--quiet
Cover from local file with custom lyrics
mmx music cover
--prompt
"Jazz, piano, slow"
\
--audio-file original.mp3 --lyrics-file lyrics.txt
--out
jazz_cover.mp3
--quiet
Reproducible result with seed

mmx music cover
--prompt
"Pop, upbeat"
--audio
https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3
--seed
42
--out
cover.mp3
vision describe
Image understanding via VLM. Provide either
--image
or
--file-id
, not both.
mmx vision describe
(
--image
<
path-or-url
>
|
--file-id
<
id
>
)
[
flags
]
Flag
Type
Description
--image 
string
Local path or URL (auto base64-encoded)
--file-id 
string
Pre-uploaded file ID (skips base64)
--prompt 
string
Question about the image (default:
"Describe the image."
)
mmx vision describe
--image
photo.jpg
--prompt
"What breed?"
--output
json
stdout
description text (text mode) or full response (json mode).
search query
Web search via MiniMax.
mmx search query
--q
<
query
Flag
Type
Description
--q 
string,
required
Search query
mmx search query
--q
"MiniMax AI"
--output
json
--quiet
quota show
Display Token Plan usage and remaining quotas.
mmx
quota
show
[
--output json
]
Tool Schema Export
Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:



All tool-worthy commands (excludes auth/config/update)
mmx config export-schema
Single command
mmx config export-schema
--command
"video generate"
Use this to dynamically register mmx commands as tools in your agent framework.
Exit Codes
Code
Meaning
0
Success
1
General error
2
Usage error (bad flags, missing args)
3
Authentication error
4
Quota exceeded
5
Timeout
10
Content filter triggered
Piping Patterns
stdout is always clean data — safe to pipe
mmx text chat
--message
"Hi"
--output
json
|
jq
'.content'
stderr has progress/spinners — discard if needed
mmx video generate
--prompt
"Waves"
2

/dev/null

Chain: generate image → describe it
URL
$(
mmx image generate
--prompt
"A sunset"
--quiet
)
mmx vision describe
--image
"
$URL
"
--quiet
Async video workflow
TASK
$(
mmx video generate
--prompt
"A robot"
--async
--quiet
|
jq
-r
'.taskId'
)
mmx video task get --task-id
"
$TASK
"
--output
json
mmx video download --task-id
"
$TASK
"
--out
robot.mp4
Configuration Precedence
CLI flags → environment variables →
~/.mmx/config.json
→ defaults.
Persistent config
mmx config
set
--key
region
--value
cn
mmx config show
Environment
export
MINIMAX_API_KEY
=
sk-xxxxx
export
MINIMAX_REGION
=
cn
Default Model Configuration
Set per-modality defaults so you don't need
--model
every time:
Set defaults
mmx config
set
--key
default-text-model
--value
MiniMax-M2.7-highspeed
mmx config
set
--key
default-speech-model
--value
speech-2.8-hd
mmx config
set
--key
default-video-model
--value
MiniMax-Hailuo-2.3
mmx config
set
--key
default-music-model
--value
music-2.6
Use without --model
mmx text chat
--message
"Hello"
mmx speech synthesize
--text
"Hello"
--out
hello.mp3
mmx video generate
--prompt
"Ocean waves"
mmx music generate
--prompt
"Upbeat pop"
--instrumental
--model still overrides per-call
mmx text chat
--model
MiniMax-M2.7
--message
"Hello"
Resolution priority
:
--model
flag > config default > hardcoded fallback.

安装

Install

Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json)

Verify active auth source

Or pass per-call

Single message

Multi-turn

From file

stdout: image URLs (one per line in quiet mode)

stdout: saved file paths (one per line)

Non-blocking: get task ID

stdout:

Blocking: wait and get file path

stdout: ocean.mp4

stdout: hello.mp3

saves hello.mp3 + hello.srt (SRT subtitle file)

With lyrics

Auto-generate lyrics from prompt

Instrumental

Detailed prompt with vocal characteristics

Cover from URL

Cover from local file with custom lyrics

Reproducible result with seed

All tool-worthy commands (excludes auth/config/update)

Single command

stdout is always clean data — safe to pipe

stderr has progress/spinners — discard if needed

Chain: generate image → describe it

URL

Async video workflow

TASK

Persistent config

Environment

Set defaults

Use without --model

--model still overrides per-call