Confirm? (press enter to confirm, or tell me what to change)
Call mmx
Generate the music directly.
Step 2: Advanced Control Mode
Goal
User has full control over every parameter before generation.
Lyrics phase
:
If user provided lyrics: display them formatted with section markers, ask for edits.
The final lyrics will be passed via
--lyrics
to mmx.
If user has a theme but no lyrics: will use
--lyrics-optimizer
to auto-generate.
Support iterative editing: "change the second chorus" -> only rewrite that section.
User can also write lyrics themselves and pass via
--lyrics
.
Prompt phase
:
Generate a recommended prompt based on the lyrics' mood and content.
Present it as editable tags the user can add/remove/modify.
Refer to the
Prompt Writing Guide
appendix for the full vocabulary.
Advanced planning
(optional, offer but don't force):
Song structure: verse-chorus-verse-chorus-bridge-chorus or custom
BPM suggestion (encode in prompt as tempo descriptor)
Reference style: "something like X style" -> map to prompt tags
Vocal character description
Final confirmation
Show complete parameter summary, then generate.
Step 3: Call mmx
Generate music using the mmx CLI:
Vocal with auto-generated lyrics:
mmx music generate
\
--prompt
""
\
--lyrics-optimizer
\
--genre
""
--mood
""
--vocals
""
\
--instruments
""
--bpm
<
bpm
>
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
Vocal with user-provided lyrics:
mmx music generate
\
--prompt
""
\
--lyrics
""
\
--genre
""
--mood
""
--vocals
""
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
Instrumental (no vocal):
mmx music generate
\
--prompt
""
\
--instrumental
\
--genre
""
--mood
""
--instruments
""
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
Use structured flags (
--genre
,
--mood
,
--vocals
,
--instruments
,
--bpm
,
--key
,
--tempo
,
--structure
,
--references
,
--avoid
,
--use-case
) to give the API
fine-grained control instead of cramming everything into
--prompt
.
Display a progress indicator while waiting. Typical generation takes 30-120 seconds.
Step 4: Playback
After generation, detect an available audio player and play the file.
Detect player:
command
-v
mpv
||
command
-v
ffplay
||
command
-v
afplay
Play based on detected player (in priority order):
Player
Command
Controls
mpv
(preferred)
mpv --no-video ~/Music/minimax-gen/.mp3
space = pause/resume, q = quit, left/right = seek
ffplay
ffplay -nodisp -autoexit ~/Music/minimax-gen/.mp3
q = quit
afplay
(macOS)
afplay ~/Music/minimax-gen/.mp3
Ctrl+C = stop
None found
Do not attempt playback
Show file path only
After starting playback, tell the user (localize all text):
Now playing: .mp3
Saved to: ~/Music/minimax-gen/.mp3
Do NOT show playback controls (e.g. keyboard shortcuts) — they don't work in this
environment since the player runs in the background.
If no player is found (localize all text):
No audio player detected.
File saved to: ~/Music/minimax-gen/.mp3
Tip: Install mpv for the best playback experience (brew install mpv).
Step 5: Feedback & Iteration
After playback, ask for feedback:
How was this song?
1. Love it, keep it!
2. Not quite, adjust and regenerate
3. Fine-tune lyrics/style then regenerate
4. Don't want it, start over
Based on feedback:
Satisfied
Done. Mention the file path again.
Adjust & regenerate
Ask what to change (prompt? lyrics? style?), apply edits,
re-run generation. Keep the old file with a
_v1
suffix for comparison.
Fine-tune
Enter Advanced Control Mode with the current parameters pre-filled.
Delete & restart
Remove the file, go back to Step 0.
Cover Mode
Generate a cover version of a song based on reference audio. Model:
music-cover-free
.
Reference audio requirements
mp3, wav, flac — duration 6s to 6min, max 50MB.
If no lyrics are provided, the original lyrics are extracted via ASR automatically.
Workflow
When the user selects Cover mode:
Ask for the source audio — a local file path or URL
Ask for the target cover style (e.g., "acoustic cover, stripped-down, intimate vocal")
Optionally ask for custom lyrics or lyrics file
Commands
Cover from local file:
mmx music cover
\
--prompt
""
\
--audio-file
<
source.mp
3
>
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
Cover from URL:
mmx music cover
\
--prompt
""
\
--audio
<
source_url
>
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
With custom lyrics (text):
mmx music cover
\
--prompt
"<style>"
\
--audio-file
<
source.mp
3
>
\
--lyrics
""
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
With custom lyrics (file):
mmx music cover
\
--prompt
"<style>"
\
--audio-file
<
source.mp
3
>
\
--lyrics-file
<
lyrics.txt
>
\
--out
~/Music/minimax-gen/
<
filename
>
.mp3
\
--quiet
--non-interactive
Optional flags
Flag
Description
--seed
Random seed 0-1000000 for reproducible results
--channel
1
(mono) or
2
(stereo, default)
--format
mp3
(default),
wav
,
pcm
--sample-rate
Sample rate (default: 44100)
--bitrate
Bitrate (default: 256000)
After generation
Proceed with normal playback and feedback flow (Step 4 & 5).
Error Handling
Error
Action
mmx not found
npm install -g mmx-cli
mmx auth error (exit code 3)
mmx auth login
Quota exceeded (exit code 4)
Report quota limit, suggest waiting or upgrading
API timeout (exit code 5)
Retry once, then report failure
Content filter (exit code 10)
Adjust prompt to avoid filtered content
Invalid lyrics format
Auto-fix section markers, warn user
No audio player found
Save file and tell user the path, suggest installing mpv
Network error
Show error detail, suggest checking connection
Important Notes
Never reproduce copyrighted lyrics.
When doing covers, always write original lyrics
inspired by the song's theme. Explain this to the user.
Prompt language
The API prompt works best with English tags. Chinese tags are also
acceptable. Mixing is OK.
Section markers in lyrics
The API recognizes
[verse]
,
[chorus]
,
[bridge]
,
[outro]
,
[intro]
. Always include them when providing
--lyrics
.
File management
If
~/Music/minimax-gen/
has more than 50 files, suggest cleanup
when starting a new session.
Structured params
Prefer using
--genre
,
--mood
,
--vocals
,
--instruments
,
--bpm
etc. over embedding everything in
--prompt
. This gives the API better control.
Lyrics language via style
When the user wants lyrics in a specific language, express
it through the vocal description or genre (e.g., "Japanese female vocalist", "Mandopop
ballad") rather than appending a language directive to the prompt.
Appendix: Prompt Writing Guide
See
references/prompt_guide.md
for the complete prompt writing guide,
including genre/vocal/instrument references and BPM tables.