Memory Tuning Stable docs: @docs/parallelisms.md Card: @skills/nemo-mbridge-perf-memory-tuning/card.yaml What It Is GPU OOM failures during training often stem from memory fragmentation rather than raw capacity. PyTorch's default CUDA allocator can leave unusable gaps between allocations. The single most effective fix is: export PYTORCH_CUDA_ALLOC_CONF = expandable_segments:True This tells PyTorch to use expandable (non-fixed-size) memory segments, which dramatically reduces fragmentation and often eliminates borderline OOM without any model or parallelism changes. Show more Installs 561 Repository nvidia/skills GitHub Stars 1.3K First Seen May 29, 2026 Security Audits Gen Agent Trust Hub Pass Socket Pass Snyk Pass
nemo-mbridge-perf-memory-tuning
安装
npx skills add https://github.com/nvidia/skills --skill nemo-mbridge-perf-memory-tuning