CUDA Graphs Stable documentation: @docs/training/cuda-graphs.md Card: @skills/nemo-mbridge-perf-cuda-graphs/card.yaml What It Is CUDA graphs capture GPU operations once and replay them with minimal host-driver overhead. Bridge supports two implementations: cuda_graph_impl Mechanism Scope support "local" MCore FullCudaGraphWrapper wrapping entire fwd+bwd full_iteration "transformer_engine" TE make_graphed_callables() per layer attn , mlp , moe , moe_router , moe_preprocess , mamba Quick Decision Show more Installs 549 Repository nvidia/skills GitHub Stars 1.3K First Seen May 29, 2026 Security Audits Gen Agent Trust Hub Pass Socket Pass Snyk Pass
nemo-mbridge-perf-cuda-graphs
安装
npx skills add https://github.com/nvidia/skills --skill nemo-mbridge-perf-cuda-graphs