torchforge-rl-training

安装量: 55
排名: #13472

安装

npx skills add https://github.com/davila7/claude-code-templates --skill torchforge-rl-training
torchforge: PyTorch-Native Agentic RL Library
torchforge is Meta's PyTorch-native RL library that separates infrastructure concerns from algorithm concerns. It enables rapid RL research by letting you focus on algorithms while handling distributed training, inference, and weight sync automatically.
When to Use torchforge
Choose torchforge when you need:
Clean separation between RL algorithms and infrastructure
PyTorch-native abstractions (no Ray dependency)
Easy algorithm experimentation (GRPO, DAPO, SAPO in ~100 lines)
Scalable training with Monarch actor system
Integration with TorchTitan for model parallelism
Consider alternatives when:
You need production-ready stability → use
miles
or
verl
You want Megatron-native training → use
slime
torchforge is experimental and APIs may change
Key Features
Algorithm isolation
Implement RL algorithms without touching infrastructure
Scalability
From single GPU to thousands via Monarch
Modern stack
TorchTitan (training), vLLM (inference), TorchStore (sync)
Loss functions
GRPO, DAPO, CISPO, GSPO, SAPO built-in Architecture Overview ┌─────────────────────────────────────────────────────────┐ │ Application Layer (Your Code) │ │ - Define reward models, loss functions, sampling │ └─────────────────────┬───────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────┐ │ Forge API Layer │ │ - Episode, Group dataclasses │ │ - Service interfaces (async/await) │ └─────────────────────┬───────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────┐ │ Distributed Services (Monarch) │ │ ├── Trainer (TorchTitan FSDP) │ │ ├── Generator (vLLM inference) │ │ ├── Reference Model (frozen KL baseline) │ │ └── Reward Actors (compute rewards) │ └─────────────────────────────────────────────────────────┘ Installation

Create environment

conda create -n forge python = 3.12 conda activate forge

Install (handles PyTorch nightly + dependencies)

./scripts/install.sh

Verify

python -c "import torch, forge, vllm; print('OK')" ROCm Installation ./scripts/install_rocm.sh Quick Start SFT Training (2+ GPUs) python -m apps.sft.main --config apps/sft/llama3_8b.yaml GRPO Training (3+ GPUs) python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml Workflow 1: GRPO Training for Math Reasoning Use this workflow for training reasoning models with group-relative advantages. Prerequisites Checklist 3+ GPUs (GPU0: trainer, GPU1: ref_model, GPU2: generator) Model from HuggingFace Hub Training dataset (GSM8K, MATH, etc.) Step 1: Create Configuration

config/grpo_math.yaml

model : "Qwen/Qwen2.5-7B-Instruct" dataset : path : "openai/gsm8k" split : "train" streaming : true training : batch_size : 4 learning_rate : 1e-6 seq_len : 4096 dtype : bfloat16 gradient_accumulation_steps : 4 grpo : n_samples : 8

Responses per prompt

clip_low : 0.2 clip_high : 0.28 beta : 0.1

KL penalty coefficient

temperature : 0.7 services : generator : procs : 1 num_replicas : 1 with_gpus : true trainer : procs : 1 num_replicas : 1 with_gpus : true ref_model : procs : 1 num_replicas : 1 with_gpus : true Step 2: Define Reward Function

rewards.py

Reward functions are in forge.data.rewards

from forge . data . rewards import MathReward , ThinkingReward import re

Or define your own reward function

class CustomMathReward : def call ( self , prompt : str , response : str , target : str ) -

float :

Extract answer from response

match

re . search ( r'\boxed{([^}]+)}' , response ) if not match : return 0.0 answer = match . group ( 1 ) . strip ( ) return 1.0 if answer == target else 0.0 Step 3: Launch Training python -m apps.grpo.main --config config/grpo_math.yaml Step 4: Monitor Progress Check W&B dashboard for loss curves Verify entropy is decreasing (policy becoming more deterministic) Monitor KL divergence (should stay bounded) Workflow 2: Custom Loss Function Use this workflow to implement new RL algorithms. Step 1: Create Loss Class

src/forge/losses/custom_loss.py

import torch import torch . nn as nn class CustomLoss ( nn . Module ) : def init ( self , clip_range : float = 0.2 , beta : float = 0.1 ) : super ( ) . init ( ) self . clip_range = clip_range self . beta = beta def forward ( self , logprobs : torch . Tensor , ref_logprobs : torch . Tensor , advantages : torch . Tensor , padding_mask : torch . Tensor , ) -

torch . Tensor :

Compute importance ratio

ratio

torch . exp ( logprobs - ref_logprobs )

Clipped policy gradient

clipped_ratio

torch . clamp ( ratio , 1 - self . clip_range , 1 + self . clip_range ) pg_loss = - torch . min ( ratio * advantages , clipped_ratio * advantages )

KL penalty

kl

ref_logprobs

logprobs

Apply mask and aggregate

masked_loss

( pg_loss + self . beta * kl ) * padding_mask loss = masked_loss . sum ( ) / padding_mask . sum ( ) return loss Step 2: Integrate into Application

apps/custom/main.py

from forge . losses . custom_loss import CustomLoss loss_fn = CustomLoss ( clip_range = 0.2 , beta = 0.1 )

In training loop

loss

loss_fn ( logprobs = logprobs , ref_logprobs = ref_logprobs , advantages = advantages , padding_mask = padding_mask , ) Workflow 3: Multi-GPU Distributed Training Use this workflow for scaling to multiple GPUs or nodes. Configuration for Distributed

config/distributed.yaml

model : "meta-llama/Meta-Llama-3.1-8B-Instruct" parallelism : tensor_parallel_degree : 2

Split model across GPUs

pipeline_parallel_degree : 1 data_parallel_shard_degree : 2 services : generator : procs : 2

2 processes for TP=2

num_replicas : 1 with_gpus : true trainer : procs : 2 num_replicas : 1 with_gpus : true Launch with SLURM

Submit job

sbatch --nodes = 2 --gpus-per-node = 8 run_grpo.sh Launch Locally (Multi-GPU)

8 GPU setup

python -m apps.grpo.main \ --config config/distributed.yaml \ --trainer.procs 4 \ --generator.procs 4 Core API Reference Training Batch Format torchforge uses dictionary-based batches for training:

inputs: list of dicts with torch.Tensor values

inputs

[ { "tokens" : torch . Tensor } ]

targets: list of dicts with training signals

targets

[ { "response" : torch . Tensor , "ref_logprobs" : torch . Tensor , "advantages" : torch . Tensor , "padding_mask" : torch . Tensor } ]

train_step returns loss as float

loss

trainer . train_step ( inputs , targets ) Completion Generated output from vLLM: @dataclass class Completion : text : str

Generated text

token_ids : list [ int ]

Token IDs

logprobs : list [ float ]

Log probabilities

metadata : dict

Custom metadata

Built-in Loss Functions Loss Functions Loss functions are in the forge.losses module: from forge . losses import SimpleGRPOLoss , ReinforceLoss

SimpleGRPOLoss for GRPO training

loss_fn

SimpleGRPOLoss ( beta = 0.1 )

Forward pass

loss

loss_fn ( logprobs = logprobs , ref_logprobs = ref_logprobs , advantages = advantages , padding_mask = padding_mask ) ReinforceLoss from forge . losses . reinforce_loss import ReinforceLoss

With optional importance ratio clipping

loss_fn

ReinforceLoss
(
clip_ratio
=
0.2
)
Common Issues and Solutions
Issue: Not Enough GPUs
Symptoms
"Insufficient GPU resources" error Solutions :

Reduce service requirements

services : generator : procs : 1 with_gpus : true trainer : procs : 1 with_gpus : true

Remove ref_model (uses generator weights)

Or use CPU for reference model:
ref_model
:
with_gpus
:
false
Issue: OOM During Generation
Symptoms
CUDA OOM in vLLM Solutions :

Reduce batch size

grpo : n_samples : 4

Reduce from 8

Or reduce sequence length

training
:
seq_len
:
2048
Issue: Slow Weight Sync
Symptoms
Long pauses between training and generation Solutions :

Enable RDMA (if available)

export TORCHSTORE_USE_RDMA = 1

Or reduce sync frequency

training: sync_interval: 10

Sync every 10 steps

Issue: Policy Collapse
Symptoms
Entropy drops to zero, reward stops improving Solutions :

Increase KL penalty

grpo : beta : 0.2

Increase from 0.1

Or add entropy bonus

training : entropy_coef : 0.01 Resources Documentation : https://meta-pytorch.org/torchforge GitHub : https://github.com/meta-pytorch/torchforge Discord : https://discord.gg/YsTYBh6PD9 TorchTitan : https://github.com/pytorch/torchtitan Monarch : https://github.com/meta-pytorch/monarch

返回排行榜