MiniMax Music Generation Skill

Generate songs (vocal or instrumental) using the MiniMax Music API. Supports two creation

modes:

Basic

(one-sentence-in, song-out) and

Advanced Control

(edit lyrics, refine

prompt, plan before generating).

Prerequisites

mmx CLI

(required): Music generation uses the

mmx

command-line tool.

Check if installed:

command

-v

mmx

&&

mmx

--version

||

echo

"mmx not found"

Install (requires Node.js):

npm

install

-g

mmx-cli

Authenticate (first time only):

mmx auth login --api-key

<

your-minimax-api-key

>

The API key can be obtained from

MiniMax Platform

.

Credentials are saved to

~/.mmx/credentials.json

and persist across sessions.

Verify:

mmx

quota

show

Audio player

(recommended):

mpv

,

ffplay

, or

afplay

(macOS built-in) for local

playback.

mpv

is preferred for its interactive controls.

CLI Tool

This skill uses the

mmx

CLI for all music generation:

Music Generation

:

mmx music generate

— model:

music-2.6-free

Supports

--lyrics-optimizer

to auto-generate lyrics from prompt

Supports

--instrumental

for instrumental tracks

Supports

--lyrics

for user-provided lyrics

Structured params:

--genre

,

--mood

,

--vocals

,

--instruments

,

--bpm

,

--key

,

--tempo

,

--structure

,

--references

Cover

:

mmx music cover

— model:

music-cover-free

Takes reference audio via

--audio-file

or

--audio

--prompt

describes the target cover style

Agent flags

Always add

--quiet --non-interactive

when calling mmx from agents.

Pipeline

:

Vocal:

User description -> mmx music generate --lyrics-optimizer -> MP3

Instrumental:

User description -> mmx music generate --instrumental -> MP3

Cover:

Source audio + style -> mmx music cover -> MP3

Storage

All generated music is saved to

~/Music/minimax-gen/

. Create the directory if it doesn't

exist. Files are named with a timestamp and a short slug derived from the prompt:

YYYYMMDD_HHMMSS_.mp3

Language & Interaction

Detect the user's language from their first message and respond in that language for the

entire session. This applies to all interaction text, questions, confirmations, and feedback

prompts.

User-facing text localization rule

:

ALL text shown to the user — including preview labels, field names, confirmations, status

messages, playback info, feedback prompts,

and the prompt/description preview

— MUST

be fully translated into the user's language.

The

API prompt

sent to the model should always be written in English for best

generation quality. However, when previewing the prompt to the user, show a localized

description in the user's language instead of the raw English prompt. The English prompt

is an internal implementation detail — the user does not need to see it.

The templates below are written in English as reference. At runtime, translate every label

and message into the user's detected language.

Lyrics language rule

:

Default lyrics language = the user's language. A Chinese-speaking user gets Chinese lyrics;

an English-speaking user gets English lyrics.

Only generate lyrics in a different language if the user

explicitly

requests it.

When a different lyrics language is needed, embed it naturally into the vocal or genre

description in the prompt. For example, instead of appending "with Korean lyrics", use

"featuring a Korean female vocalist" or specify a genre that implies the language (e.g.,

"K-pop", "J-rock", "Mandopop", "Latin pop").

Workflow

Step 0: Detect Intent

Parse the user's message to determine:

Song category

vocal (with lyrics), instrumental (no vocals), or cover

Creation mode preference

did they provide detailed requirements (Advanced) or a

casual one-liner (Basic)?

If ambiguous, ask using this decision tree:

Q1: What type of music?

- Vocal (with lyrics)

- Instrumental (no vocals)

- Cover

Q2: Creation mode?

- Basic — one-line description, auto-generate

- Advanced — edit lyrics, refine prompt, plan

If the user gives a clear one-liner like "make me a sad piano piece", skip the questions —

infer instrumental + basic mode and proceed.

Step 1: Basic Mode

Goal

User provides a short description, the skill auto-generates everything, then calls

the API.

Expand the description into a prompt

Take the user's one-liner and expand it into a

rich music prompt. Refer to the

Prompt Writing Guide

appendix at the end of this

document for style vocabulary, genre/instrument references, and prompt structure.

The API prompt should always be written in English

for best generation quality,

regardless of the user's language.

Follow this pattern:

A [mood] [BPM optional] [genre] song, featuring [vocal description],

about [narrative/theme], [atmosphere], [key instruments and production].

Show the user a preview

before generating. Translate all labels AND the prompt

description into the user's language. The English prompt is only used internally when

calling the API — the user should never see it. Example template (English reference —

localize everything at runtime):

About to generate:

Type: Vocal / Instrumental

Description: indie folk, melancholy, acoustic guitar, gentle female voice

Lyrics: Auto-generated (--lyrics-optimizer)

Confirm? (press enter to confirm, or tell me what to change)

Call mmx

Generate the music directly.

Step 2: Advanced Control Mode

Goal

User has full control over every parameter before generation.

Lyrics phase

:

If user provided lyrics: display them formatted with section markers, ask for edits.

The final lyrics will be passed via

--lyrics

to mmx.

If user has a theme but no lyrics: will use

--lyrics-optimizer

to auto-generate.

Support iterative editing: "change the second chorus" -> only rewrite that section.

User can also write lyrics themselves and pass via

--lyrics

.

Prompt phase

:

Generate a recommended prompt based on the lyrics' mood and content.

Present it as editable tags the user can add/remove/modify.

Refer to the

Prompt Writing Guide

appendix for the full vocabulary.

Advanced planning

(optional, offer but don't force):

Song structure: verse-chorus-verse-chorus-bridge-chorus or custom

BPM suggestion (encode in prompt as tempo descriptor)

Reference style: "something like X style" -> map to prompt tags

Vocal character description

Final confirmation

Show complete parameter summary, then generate.

Step 3: Call mmx

Generate music using the mmx CLI:

Vocal with auto-generated lyrics:

mmx music generate

\

--prompt

""

\

--lyrics-optimizer

\

--genre

""

--mood

""

--vocals

""

\

--instruments

""

--bpm

<

bpm

>

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

Vocal with user-provided lyrics:

mmx music generate

\

--prompt

""

\

--lyrics

""

\

--genre

""

--mood

""

--vocals

""

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

Instrumental (no vocal):

mmx music generate

\

--prompt

""

\

--instrumental

\

--genre

""

--mood

""

--instruments

""

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

Use structured flags (

--genre

,

--mood

,

--vocals

,

--instruments

,

--bpm

,

--key

,

--tempo

,

--structure

,

--references

,

--avoid

,

--use-case

) to give the API

fine-grained control instead of cramming everything into

--prompt

.

Display a progress indicator while waiting. Typical generation takes 30-120 seconds.

Step 4: Playback

After generation, detect an available audio player and play the file.

Detect player:

command

-v

mpv

||

command

-v

ffplay

||

command

-v

afplay

Play based on detected player (in priority order):

Player

Command

Controls

mpv

(preferred)

mpv --no-video ~/Music/minimax-gen/.mp3

space = pause/resume, q = quit, left/right = seek

ffplay

ffplay -nodisp -autoexit ~/Music/minimax-gen/.mp3

q = quit

afplay

(macOS)

afplay ~/Music/minimax-gen/.mp3

Ctrl+C = stop

None found

Do not attempt playback

Show file path only

After starting playback, tell the user (localize all text):

Now playing: .mp3

Saved to: ~/Music/minimax-gen/.mp3

Do NOT show playback controls (e.g. keyboard shortcuts) — they don't work in this

environment since the player runs in the background.

If no player is found (localize all text):

No audio player detected.

File saved to: ~/Music/minimax-gen/.mp3

Tip: Install mpv for the best playback experience (brew install mpv).

Step 5: Feedback & Iteration

After playback, ask for feedback:

How was this song?

1. Love it, keep it!

2. Not quite, adjust and regenerate

3. Fine-tune lyrics/style then regenerate

4. Don't want it, start over

Based on feedback:

Satisfied

Done. Mention the file path again.

Adjust & regenerate

Ask what to change (prompt? lyrics? style?), apply edits,

re-run generation. Keep the old file with a

_v1

suffix for comparison.

Fine-tune

Enter Advanced Control Mode with the current parameters pre-filled.

Delete & restart

Remove the file, go back to Step 0.

Cover Mode

Generate a cover version of a song based on reference audio. Model:

music-cover-free

.

Reference audio requirements

mp3, wav, flac — duration 6s to 6min, max 50MB.

If no lyrics are provided, the original lyrics are extracted via ASR automatically.

Workflow

When the user selects Cover mode:

Ask for the source audio — a local file path or URL

Ask for the target cover style (e.g., "acoustic cover, stripped-down, intimate vocal")

Optionally ask for custom lyrics or lyrics file

Commands

Cover from local file:

mmx music cover

\

--prompt

""

\

--audio-file

<

source.mp

3

>

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

Cover from URL:

mmx music cover

\

--prompt

""

\

--audio

<

source_url

>

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

With custom lyrics (text):

mmx music cover

\

--prompt

"<style>"

\

--audio-file

<

source.mp

3

>

\

--lyrics

""

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

With custom lyrics (file):

mmx music cover

\

--prompt

"<style>"

\

--audio-file

<

source.mp

3

>

\

--lyrics-file

<

lyrics.txt

>

\

--out

~/Music/minimax-gen/

<

filename

>

.mp3

\

--quiet

--non-interactive

Optional flags

Flag

Description

--seed

Random seed 0-1000000 for reproducible results

--channel

1

(mono) or

2

(stereo, default)

--format

mp3

(default),

wav

,

pcm

--sample-rate

Sample rate (default: 44100)

--bitrate

Bitrate (default: 256000)

After generation

Proceed with normal playback and feedback flow (Step 4 & 5).

Error Handling

Error

Action

mmx not found

npm install -g mmx-cli

mmx auth error (exit code 3)

mmx auth login

Quota exceeded (exit code 4)

Report quota limit, suggest waiting or upgrading

API timeout (exit code 5)

Retry once, then report failure

Content filter (exit code 10)

Adjust prompt to avoid filtered content

Invalid lyrics format

Auto-fix section markers, warn user

No audio player found

Save file and tell user the path, suggest installing mpv

Network error

Show error detail, suggest checking connection

Important Notes

Never reproduce copyrighted lyrics.

When doing covers, always write original lyrics

inspired by the song's theme. Explain this to the user.

Prompt language

The API prompt works best with English tags. Chinese tags are also

acceptable. Mixing is OK.

Section markers in lyrics

The API recognizes

[verse]

,

[chorus]

,

[bridge]

,

[outro]

,

[intro]

. Always include them when providing

--lyrics

.

File management

If

~/Music/minimax-gen/

has more than 50 files, suggest cleanup

when starting a new session.

Structured params

Prefer using
--genre
,
--mood
,
--vocals
,
--instruments
,
--bpm
etc. over embedding everything in
--prompt
. This gives the API better control.
Lyrics language via style: When the user wants lyrics in a specific language, express it through the vocal description or genre (e.g., "Japanese female vocalist", "Mandopop ballad") rather than appending a language directive to the prompt. Appendix: Prompt Writing Guide See references/prompt_guide.md for the complete prompt writing guide, including genre/vocal/instrument references and BPM tables.

minimax-music-gen

安装