nemo-mbridge-perf-memory-tuning

安装量: 591

排名: #9074

Memory Tuning Stable docs: @docs/parallelisms.md Card: @skills/nemo-mbridge-perf-memory-tuning/card.yaml What It Is GPU OOM failures during training often stem from memory fragmentation rather than raw capacity. PyTorch's default CUDA allocator can leave unusable gaps between allocations. The single most effective fix is: export PYTORCH_CUDA_ALLOC_CONF = expandable_segments:True This tells PyTorch to use expandable (non-fixed-size) memory segments, which dramatically reduces fragmentation and often eliminates borderline OOM without any model or parallelism changes. Show more Installs 561 Repository nvidia/skills GitHub Stars 1.3K First Seen May 29, 2026 Security Audits Gen Agent Trust Hub Pass Socket Pass Snyk Pass