launch-nemo-rl — running NeMo-RL recipes on Kubernetes via nrl-k8s This is the playbook for the nrl-k8s CLI at infra/nrl_k8s/ . Follow it when the user asks to launch / iterate / debug a NeMo-RL recipe on a Kubernetes cluster. Verify current state ( kubectl , git log , the recipe + infra files) before acting — the cluster is shared and the cost of a wrong action is high. 1. One command, two modes There is a single top-level submission command: nrl-k8s run . It has two lifecycle modes. Mode Invocation When to use Cluster after? Ephemeral (default) nrl-k8s run One-shot. KubeRay applies a RayJob, runs, tears the cluster down. Best for most runs. No (auto) Long-lived nrl-k8s run --raycluster Dev loop. Reuses a matching live cluster, applies if absent, warns + reuses on drift (pass --recreate to replace). Then submits daemons and training. First-choice for iteration. Yes Ask: Do I need this cluster after the run? If yes, use --raycluster . Otherwise use the default (ephemeral). The rest of the CLI is observability / stage-by-stage control: Show more Installs 555 Repository nvidia/skills GitHub Stars 1.3K First Seen May 29, 2026 Security Audits Gen Agent Trust Hub Pass Socket Pass Snyk Pass
launch-nemo-rl
安装
npx skills add https://github.com/nvidia/skills --skill launch-nemo-rl