nemo-mbridge-perf-moe-long-context

安装量: 554
排名: #9339

安装

npx skills add https://github.com/nvidia/skills --skill nemo-mbridge-perf-moe-long-context

MoE Long-Context Training Stable docs: @docs/training/moe-optimization.md Card: @skills/nemo-mbridge-perf-moe-long-context/card.yaml What Changes At Long Context Once sequence length moves well past the 4K-class regime, attention memory and activation residency become the dominant constraints. For MoE models, that usually means you need some combination of: context parallelism selective recompute lower precision CPU offload for optimizer state a dispatcher and PP layout that do not waste the smaller remaining DP budget Rounded Scaling Patterns Show more Installs 554 Repository nvidia/skills GitHub Stars 1.3K First Seen May 29, 2026 Security Audits Gen Agent Trust Hub Pass Socket Pass Snyk Pass

返回排行榜