- torchforge: PyTorch-Native Agentic RL Library
- torchforge is Meta's PyTorch-native RL library that separates infrastructure concerns from algorithm concerns. It enables rapid RL research by letting you focus on algorithms while handling distributed training, inference, and weight sync automatically.
- When to Use torchforge
- Choose torchforge when you need:
- Clean separation between RL algorithms and infrastructure
- PyTorch-native abstractions (no Ray dependency)
- Easy algorithm experimentation (GRPO, DAPO, SAPO in ~100 lines)
- Scalable training with Monarch actor system
- Integration with TorchTitan for model parallelism
- Consider alternatives when:
- You need production-ready stability → use
- miles
- or
- verl
- You want Megatron-native training → use
- slime
- torchforge is experimental and APIs may change
- Key Features
- Algorithm isolation
-
- Implement RL algorithms without touching infrastructure
- Scalability
-
- From single GPU to thousands via Monarch
- Modern stack
-
- TorchTitan (training), vLLM (inference), TorchStore (sync)
- Loss functions
- GRPO, DAPO, CISPO, GSPO, SAPO built-in Architecture Overview ┌─────────────────────────────────────────────────────────┐ │ Application Layer (Your Code) │ │ - Define reward models, loss functions, sampling │ └─────────────────────┬───────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────┐ │ Forge API Layer │ │ - Episode, Group dataclasses │ │ - Service interfaces (async/await) │ └─────────────────────┬───────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────┐ │ Distributed Services (Monarch) │ │ ├── Trainer (TorchTitan FSDP) │ │ ├── Generator (vLLM inference) │ │ ├── Reference Model (frozen KL baseline) │ │ └── Reward Actors (compute rewards) │ └─────────────────────────────────────────────────────────┘ Installation
Create environment
conda create -n forge python = 3.12 conda activate forge
Install (handles PyTorch nightly + dependencies)
./scripts/install.sh
Verify
python -c "import torch, forge, vllm; print('OK')" ROCm Installation ./scripts/install_rocm.sh Quick Start SFT Training (2+ GPUs) python -m apps.sft.main --config apps/sft/llama3_8b.yaml GRPO Training (3+ GPUs) python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml Workflow 1: GRPO Training for Math Reasoning Use this workflow for training reasoning models with group-relative advantages. Prerequisites Checklist 3+ GPUs (GPU0: trainer, GPU1: ref_model, GPU2: generator) Model from HuggingFace Hub Training dataset (GSM8K, MATH, etc.) Step 1: Create Configuration
config/grpo_math.yaml
model : "Qwen/Qwen2.5-7B-Instruct" dataset : path : "openai/gsm8k" split : "train" streaming : true training : batch_size : 4 learning_rate : 1e-6 seq_len : 4096 dtype : bfloat16 gradient_accumulation_steps : 4 grpo : n_samples : 8
Responses per prompt
clip_low : 0.2 clip_high : 0.28 beta : 0.1
KL penalty coefficient
temperature : 0.7 services : generator : procs : 1 num_replicas : 1 with_gpus : true trainer : procs : 1 num_replicas : 1 with_gpus : true ref_model : procs : 1 num_replicas : 1 with_gpus : true Step 2: Define Reward Function
rewards.py
Reward functions are in forge.data.rewards
from forge . data . rewards import MathReward , ThinkingReward import re
Or define your own reward function
class CustomMathReward : def call ( self , prompt : str , response : str , target : str ) -
float :
Extract answer from response
match
re . search ( r'\boxed{([^}]+)}' , response ) if not match : return 0.0 answer = match . group ( 1 ) . strip ( ) return 1.0 if answer == target else 0.0 Step 3: Launch Training python -m apps.grpo.main --config config/grpo_math.yaml Step 4: Monitor Progress Check W&B dashboard for loss curves Verify entropy is decreasing (policy becoming more deterministic) Monitor KL divergence (should stay bounded) Workflow 2: Custom Loss Function Use this workflow to implement new RL algorithms. Step 1: Create Loss Class
src/forge/losses/custom_loss.py
import torch import torch . nn as nn class CustomLoss ( nn . Module ) : def init ( self , clip_range : float = 0.2 , beta : float = 0.1 ) : super ( ) . init ( ) self . clip_range = clip_range self . beta = beta def forward ( self , logprobs : torch . Tensor , ref_logprobs : torch . Tensor , advantages : torch . Tensor , padding_mask : torch . Tensor , ) -
torch . Tensor :
Compute importance ratio
ratio
torch . exp ( logprobs - ref_logprobs )
Clipped policy gradient
clipped_ratio
torch . clamp ( ratio , 1 - self . clip_range , 1 + self . clip_range ) pg_loss = - torch . min ( ratio * advantages , clipped_ratio * advantages )
KL penalty
kl
ref_logprobs
logprobs
Apply mask and aggregate
masked_loss
( pg_loss + self . beta * kl ) * padding_mask loss = masked_loss . sum ( ) / padding_mask . sum ( ) return loss Step 2: Integrate into Application
apps/custom/main.py
from forge . losses . custom_loss import CustomLoss loss_fn = CustomLoss ( clip_range = 0.2 , beta = 0.1 )
In training loop
loss
loss_fn ( logprobs = logprobs , ref_logprobs = ref_logprobs , advantages = advantages , padding_mask = padding_mask , ) Workflow 3: Multi-GPU Distributed Training Use this workflow for scaling to multiple GPUs or nodes. Configuration for Distributed
config/distributed.yaml
model : "meta-llama/Meta-Llama-3.1-8B-Instruct" parallelism : tensor_parallel_degree : 2
Split model across GPUs
pipeline_parallel_degree : 1 data_parallel_shard_degree : 2 services : generator : procs : 2
2 processes for TP=2
num_replicas : 1 with_gpus : true trainer : procs : 2 num_replicas : 1 with_gpus : true Launch with SLURM
Submit job
sbatch --nodes = 2 --gpus-per-node = 8 run_grpo.sh Launch Locally (Multi-GPU)
8 GPU setup
python -m apps.grpo.main \ --config config/distributed.yaml \ --trainer.procs 4 \ --generator.procs 4 Core API Reference Training Batch Format torchforge uses dictionary-based batches for training:
inputs: list of dicts with torch.Tensor values
inputs
[ { "tokens" : torch . Tensor } ]
targets: list of dicts with training signals
targets
[ { "response" : torch . Tensor , "ref_logprobs" : torch . Tensor , "advantages" : torch . Tensor , "padding_mask" : torch . Tensor } ]
train_step returns loss as float
loss
trainer . train_step ( inputs , targets ) Completion Generated output from vLLM: @dataclass class Completion : text : str
Generated text
token_ids : list [ int ]
Token IDs
logprobs : list [ float ]
Log probabilities
metadata : dict
Custom metadata
Built-in Loss Functions Loss Functions Loss functions are in the forge.losses module: from forge . losses import SimpleGRPOLoss , ReinforceLoss
SimpleGRPOLoss for GRPO training
loss_fn
SimpleGRPOLoss ( beta = 0.1 )
Forward pass
loss
loss_fn ( logprobs = logprobs , ref_logprobs = ref_logprobs , advantages = advantages , padding_mask = padding_mask ) ReinforceLoss from forge . losses . reinforce_loss import ReinforceLoss
With optional importance ratio clipping
loss_fn
- ReinforceLoss
- (
- clip_ratio
- =
- 0.2
- )
- Common Issues and Solutions
- Issue: Not Enough GPUs
- Symptoms
- "Insufficient GPU resources" error Solutions :
Reduce service requirements
services : generator : procs : 1 with_gpus : true trainer : procs : 1 with_gpus : true
Remove ref_model (uses generator weights)
- Or use CPU for reference model:
- ref_model
- :
- with_gpus
- :
- false
- Issue: OOM During Generation
- Symptoms
- CUDA OOM in vLLM Solutions :
Reduce batch size
grpo : n_samples : 4
Reduce from 8
Or reduce sequence length
- training
- :
- seq_len
- :
- 2048
- Issue: Slow Weight Sync
- Symptoms
- Long pauses between training and generation Solutions :
Enable RDMA (if available)
export TORCHSTORE_USE_RDMA = 1
Or reduce sync frequency
training: sync_interval: 10
Sync every 10 steps
- Issue: Policy Collapse
- Symptoms
- Entropy drops to zero, reward stops improving Solutions :
Increase KL penalty
grpo : beta : 0.2
Increase from 0.1
Or add entropy bonus
training : entropy_coef : 0.01 Resources Documentation : https://meta-pytorch.org/torchforge GitHub : https://github.com/meta-pytorch/torchforge Discord : https://discord.gg/YsTYBh6PD9 TorchTitan : https://github.com/pytorch/torchtitan Monarch : https://github.com/meta-pytorch/monarch