MiniMax CLI — Agent Skill Guide Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform. Prerequisites
Install
npm install -g mmx-cli
Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json)
mmx auth login --api-key sk-xxxxx
Verify active auth source
mmx auth status
Or pass per-call
mmx text chat --api-key sk-xxxxx --message "Hello" Region is auto-detected. Override with --region global or --region cn . Agent Flags Always use these flags in non-interactive (agent/CI) contexts: Flag Purpose --non-interactive Fail fast on missing args instead of prompting --quiet Suppress spinners/progress; stdout is pure data --output json Machine-readable JSON output --async Return task ID immediately (video generation) --dry-run Preview the API request without executing --yes Skip confirmation prompts Commands text chat Chat completion. Default model: MiniMax-M2.7 . mmx text chat --message < text
[ flags ] Flag Type Description --message
string, required , repeatable Message text. Prefix with role: to set role (e.g. "system:You are helpful" , "user:Hello" ) --messages-file string JSON file with messages array. Use - for stdin --system string System prompt --model string Model ID (default: MiniMax-M2.7 ) --max-tokens number Max tokens (default: 4096) --temperature number Sampling temperature (0.0, 1.0] --top-p number Nucleus sampling threshold --stream boolean Stream tokens (default: on in TTY) --tool string, repeatable Tool definition JSON or file path
Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet
Multi-turn
mmx text chat \ --system "You are a coding assistant." \ --message "user:Write fizzbuzz in Python" \ --output json
From file
- cat
- conversation.json
- |
- mmx text chat --messages-file -
- --output
- json
- stdout
- response text (text mode) or full response object (json mode).
image generate
Generate images. Model:
image-01
.
mmx image generate
--prompt
<
text
[ flags ] Flag Type Description --prompt
string, required Image description --aspect-ratio string e.g. 16:9 , 1:1 --n number Number of images (default: 1) --subject-ref string Subject reference: type=character,image=path-or-url --out-dir string Download images to directory --out-prefix string Filename prefix (default: image ) mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
stdout: image URLs (one per line in quiet mode)
mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
stdout: saved file paths (one per line)
video generate Generate video. Default model: MiniMax-Hailuo-2.3 . This is an async task — by default it polls until completion. mmx video generate --prompt < text
[ flags ] Flag Type Description --prompt
string, required Video description --model string MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast --first-frame string First frame image --callback-url string Webhook URL for completion --download string Save video to specific file --async boolean Return task ID immediately --no-wait boolean Same as --async --poll-interval number Polling interval (default: 5)
Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
stdout:
Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
stdout: ocean.mp4
video task get Query status of a video generation task. mmx video task get --task-id < id
[ --output json ] video download Download a completed video by task ID. mmx video download --file-id < id
[ --out < path
] speech synthesize Text-to-speech. Default model: speech-2.8-hd . Max 10k chars. mmx speech synthesize --text < text
[ flags ] Flag Type Description --text
string Text to synthesize --text-file string Read text from file. Use - for stdin --model string speech-2.8-hd (default), speech-2.6 , speech-02 --voice string Voice ID (default: English_expressive_narrator ) --speed number Speed multiplier --volume number Volume level --pitch number Pitch adjustment --format string Audio format (default: mp3 ) --sample-rate number Sample rate (default: 32000) --bitrate number Bitrate (default: 128000) --channels number Audio channels (default: 1) --language string Language boost --subtitles boolean Download and save subtitles as .srt file (alongside --out audio file). API must support subtitles for the selected model. --pronunciationstring, repeatable Custom pronunciation --sound-effect string Add sound effect --out string Save audio to file --stream boolean Stream raw audio to stdout mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
stdout: hello.mp3
mmx speech synthesize --text "Hello" --subtitles --out hello.mp3
saves hello.mp3 + hello.srt (SRT subtitle file)
echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3 music generate Generate music. Responds well to rich, structured descriptions. Model: music-2.6-free — unlimited for API key users, RPM = 3. mmx music generate --prompt < text
[ --lyrics < text
] [ flags ] Flag Type Description --prompt
string Music style description (can be detailed) --lyrics string Song lyrics with structure tags. Required unless --instrumental or --lyrics-optimizer is used. --lyrics-file string Read lyrics from file. Use - for stdin --lyrics-optimizer boolean Auto-generate lyrics from prompt. Cannot be used with --lyrics or --instrumental . --instrumental boolean Generate instrumental music (no vocals). Cannot be used with --lyrics . --vocals string Vocal style, e.g. "warm male baritone" , "bright female soprano" , "duet with harmonies" --genre string Music genre, e.g. folk, pop, jazz --mood string Mood or emotion, e.g. warm, melancholic, uplifting --instruments string Instruments to feature, e.g. "acoustic guitar, piano" --tempo string Tempo description, e.g. fast, slow, moderate --bpm number Exact tempo in beats per minute --key string Musical key, e.g. C major, A minor, G sharp --avoid string Elements to avoid in the generated music --use-case string Use case context, e.g. "background music for video" , "theme song" --structure string Song structure, e.g. "verse-chorus-verse-bridge-chorus" --references string Reference tracks or artists, e.g. "similar to Ed Sheeran" --extra string Additional fine-grained requirements --aigc-watermark boolean Embed AI-generated content watermark --format string Audio format (default: mp3 ) --sample-rate number Sample rate (default: 44100) --bitrate number Bitrate (default: 256000) --out string Save audio to file --stream boolean Stream raw audio to stdout At least one of --prompt or --lyrics is required.
With lyrics
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet
Auto-generate lyrics from prompt
mmx music generate --prompt "Upbeat pop about summer" --lyrics-optimizer --out summer.mp3 --quiet
Instrumental
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3 --quiet
Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \ --vocals "male and female duet, harmonies in chorus" \ --instruments "acoustic guitar, piano" \ --bpm 95 \ --lyrics-file song.txt \ --out duet.mp3 music cover Generate a cover version of a song based on reference audio. Model: music-cover-free — unlimited for API key users, RPM = 3. mmx music cover --prompt < text
( --audio < url
| --audio-file < path
) [ flags ] Flag Type Description --prompt
string, required Target cover style, e.g. "Indie folk, acoustic guitar, warm male vocal" --audio string URL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB) --audio-file string Local reference audio file (auto base64-encoded) --lyrics string Cover lyrics. If omitted, extracted from reference audio via ASR. --lyrics-file string Read lyrics from file. Use - for stdin --seed number Random seed 0–1000000 for reproducible results --format string Audio format: mp3 , wav , pcm (default: mp3 ) --sample-rate number Sample rate (default: 44100) --bitrate number Bitrate (default: 256000) --channel number Channels: 1 (mono) or 2 (stereo, default) --out string Save audio to file --stream boolean Stream raw audio to stdout
Cover from URL
mmx music cover --prompt "Indie folk, acoustic guitar, warm male vocal" \ --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --out cover.mp3 --quiet
Cover from local file with custom lyrics
mmx music cover --prompt "Jazz, piano, slow" \ --audio-file original.mp3 --lyrics-file lyrics.txt --out jazz_cover.mp3 --quiet
Reproducible result with seed
- mmx music cover
- --prompt
- "Pop, upbeat"
- --audio
- https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3
- --seed
- 42
- --out
- cover.mp3
- vision describe
- Image understanding via VLM. Provide either
- --image
- or
- --file-id
- , not both.
- mmx vision describe
- (
- --image
- <
- path-or-url
- >
- |
- --file-id
- <
- id
- >
- )
- [
- flags
- ]
- Flag
- Type
- Description
- --image
- string
- Local path or URL (auto base64-encoded)
- --file-id
- string
- Pre-uploaded file ID (skips base64)
- --prompt
- string
- Question about the image (default:
- "Describe the image."
- )
- mmx vision describe
- --image
- photo.jpg
- --prompt
- "What breed?"
- --output
- json
- stdout
- description text (text mode) or full response (json mode).
search query
Web search via MiniMax.
mmx search query
--q
<
query
Flag Type Description --q
string, required Search query mmx search query --q "MiniMax AI" --output json --quiet quota show Display Token Plan usage and remaining quotas. mmx quota show [ --output json ] Tool Schema Export Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:
All tool-worthy commands (excludes auth/config/update)
mmx config export-schema
Single command
mmx config export-schema --command "video generate" Use this to dynamically register mmx commands as tools in your agent framework. Exit Codes Code Meaning 0 Success 1 General error 2 Usage error (bad flags, missing args) 3 Authentication error 4 Quota exceeded 5 Timeout 10 Content filter triggered Piping Patterns
stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'
stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2
/dev/null
Chain: generate image → describe it
URL
$( mmx image generate --prompt "A sunset" --quiet ) mmx vision describe --image " $URL " --quiet
Async video workflow
TASK
$( mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId' ) mmx video task get --task-id " $TASK " --output json mmx video download --task-id " $TASK " --out robot.mp4 Configuration Precedence CLI flags → environment variables → ~/.mmx/config.json → defaults.
Persistent config
mmx config set --key region --value cn mmx config show
Environment
export MINIMAX_API_KEY = sk-xxxxx export MINIMAX_REGION = cn Default Model Configuration Set per-modality defaults so you don't need --model every time:
Set defaults
mmx config set --key default-text-model --value MiniMax-M2.7-highspeed mmx config set --key default-speech-model --value speech-2.8-hd mmx config set --key default-video-model --value MiniMax-Hailuo-2.3 mmx config set --key default-music-model --value music-2.6
Use without --model
mmx text chat --message "Hello" mmx speech synthesize --text "Hello" --out hello.mp3 mmx video generate --prompt "Ocean waves" mmx music generate --prompt "Upbeat pop" --instrumental
--model still overrides per-call
mmx text chat --model MiniMax-M2.7 --message "Hello" Resolution priority : --model flag > config default > hardcoded fallback.