PufferLib - High-Performance Reinforcement Learning Overview

PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.

When to Use This Skill

Use this skill when:

Training RL agents with PPO on any environment (single or multi-agent) Creating custom environments using the PufferEnv API Optimizing performance for parallel environment simulation (vectorization) Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc. Developing policies with CNN, LSTM, or custom architectures Scaling RL to millions of steps per second for faster experimentation Multi-agent RL with native multi-agent environment support Core Capabilities 1. High-Performance Training (PuffeRL)

PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

Quick start training:

CLI training

puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4

Distributed training

torchrun --nproc_per_node=4 train.py

Python training loop:

import pufferlib from pufferlib import PuffeRL

Create vectorized environment

env = pufferlib.make('procgen-coinrun', num_envs=256)

Create trainer

trainer = PuffeRL( env=env, policy=my_policy, device='cuda', learning_rate=3e-4, batch_size=32768 )

Training loop

for iteration in range(num_iterations): trainer.evaluate() # Collect rollouts trainer.train() # Train on batch trainer.mean_and_log() # Log results

For comprehensive training guidance, read references/training.md for:

Complete training workflow and CLI options Hyperparameter tuning with Protein Distributed multi-GPU/multi-node training Logger integration (Weights & Biases, Neptune) Checkpointing and resume training Performance optimization tips Curriculum learning patterns 2. Environment Development (PufferEnv)

Create custom high-performance environments with the PufferEnv API.

Basic environment structure:

import numpy as np from pufferlib import PufferEnv

class MyEnvironment(PufferEnv): def init(self, buf=None): super().init(buf)

    # Define spaces
    self.observation_space = self.make_space((4,))
    self.action_space = self.make_discrete(4)

    self.reset()

def reset(self):
    # Reset state and return initial observation
    return np.zeros(4, dtype=np.float32)

def step(self, action):
    # Execute action, compute reward, check done
    obs = self._get_observation()
    reward = self._compute_reward()
    done = self._is_done()
    info = {}

    return obs, reward, done, info

Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:

Different observation space types (vector, image, dict) Action space variations (discrete, continuous, multi-discrete) Multi-agent environment structure Testing utilities

For complete environment development, read references/environments.md for:

PufferEnv API details and in-place operation patterns Observation and action space definitions Multi-agent environment creation Ocean suite (20+ pre-built environments) Performance optimization (Python to C workflow) Environment wrappers and best practices Debugging and validation techniques 3. Vectorization and Performance

Achieve maximum throughput with optimized parallel simulation.

Vectorization setup:

import pufferlib

Automatic vectorization

env = pufferlib.make('environment_name', num_envs=256, num_workers=8)

Performance benchmarks:

- Pure Python envs: 100k-500k SPS

- C-based envs: 100M+ SPS

- With training: 400k-4M total SPS

Key optimizations:

Shared memory buffers for zero-copy observation passing Busy-wait flags instead of pipes/queues Surplus environments for async returns Multiple environments per worker

For vectorization optimization, read references/vectorization.md for:

Architecture and performance characteristics Worker and batch size configuration Serial vs multiprocessing vs async modes Shared memory and zero-copy patterns Hierarchical vectorization for large scale Multi-agent vectorization strategies Performance profiling and troubleshooting 4. Policy Development

Build policies as standard PyTorch modules with optional utilities.

Basic policy structure:

import torch.nn as nn from pufferlib.pytorch import layer_init

class Policy(nn.Module): def init(self, observation_space, action_space): super().init()

    # Encoder
    self.encoder = nn.Sequential(
        layer_init(nn.Linear(obs_dim, 256)),
        nn.ReLU(),
        layer_init(nn.Linear(256, 256)),
        nn.ReLU()
    )

    # Actor and critic heads
    self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
    self.critic = layer_init(nn.Linear(256, 1), std=1.0)

def forward(self, observations):
    features = self.encoder(observations)
    return self.actor(features), self.critic(features)

For complete policy development, read references/policies.md for:

CNN policies for image observations Recurrent policies with optimized LSTM (3x faster inference) Multi-input policies for complex observations Continuous action policies Multi-agent policies (shared vs independent parameters) Advanced architectures (attention, residual) Observation normalization and gradient clipping Policy debugging and testing 5. Environment Integration

Seamlessly integrate environments from popular RL frameworks.

Gymnasium integration:

import gymnasium as gym import pufferlib

Wrap Gymnasium environment

gym_env = gym.make('CartPole-v1') env = pufferlib.emulate(gym_env, num_envs=256)

Or use make directly

env = pufferlib.make('gym-CartPole-v1', num_envs=256)

PettingZoo multi-agent:

Multi-agent environment

env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)

Supported frameworks:

Gymnasium / OpenAI Gym PettingZoo (parallel and AEC) Atari (ALE) Procgen NetHack / MiniHack Minigrid Neural MMO Crafter GPUDrive MicroRTS Griddly And more...

For integration details, read references/integration.md for:

Complete integration examples for each framework Custom wrappers (observation, reward, frame stacking, action repeat) Space flattening and unflattening Environment registration Compatibility patterns Performance considerations Integration debugging Quick Start Workflow For Training Existing Environments Choose environment from Ocean suite or compatible framework Use scripts/train_template.py as starting point Configure hyperparameters for your task Run training with CLI or Python script Monitor with Weights & Biases or Neptune Refer to references/training.md for optimization For Creating Custom Environments Start with scripts/env_template.py Define observation and action spaces Implement reset() and step() methods Test environment locally Vectorize with pufferlib.emulate() or make() Refer to references/environments.md for advanced patterns Optimize with references/vectorization.md if needed For Policy Development Choose architecture based on observations: Vector observations → MLP policy Image observations → CNN policy Sequential tasks → LSTM policy Complex observations → Multi-input policy Use layer_init for proper weight initialization Follow patterns in references/policies.md Test with environment before full training For Performance Optimization Profile current throughput (steps per second) Check vectorization configuration (num_envs, num_workers) Optimize environment code (in-place ops, numpy vectorization) Consider C implementation for critical paths Use references/vectorization.md for systematic optimization Resources scripts/

train_template.py - Complete training script template with:

Environment creation and configuration Policy initialization Logger integration (WandB, Neptune) Training loop with checkpointing Command-line argument parsing Multi-GPU distributed training setup

env_template.py - Environment implementation templates:

Single-agent PufferEnv example (grid world) Multi-agent PufferEnv example (cooperative navigation) Multiple observation/action space patterns Testing utilities references/

training.md - Comprehensive training guide:

Training workflow and CLI options Hyperparameter configuration Distributed training (multi-GPU, multi-node) Monitoring and logging Checkpointing Protein hyperparameter tuning Performance optimization Common training patterns Troubleshooting

environments.md - Environment development guide:

PufferEnv API and characteristics Observation and action spaces Multi-agent environments Ocean suite environments Custom environment development workflow Python to C optimization path Third-party environment integration Wrappers and best practices Debugging

vectorization.md - Vectorization optimization:

Architecture and key optimizations Vectorization modes (serial, multiprocessing, async) Worker and batch configuration Shared memory and zero-copy patterns Advanced vectorization (hierarchical, custom) Multi-agent vectorization Performance monitoring and profiling Troubleshooting and best practices

policies.md - Policy architecture guide:

Basic policy structure CNN policies for images LSTM policies with optimization Multi-input policies Continuous action policies Multi-agent policies Advanced architectures (attention, residual) Observation processing and unflattening Initialization and normalization Debugging and testing

integration.md - Framework integration guide:

Gymnasium integration PettingZoo integration (parallel and AEC) Third-party environments (Procgen, NetHack, Minigrid, etc.) Custom wrappers (observation, reward, frame stacking, etc.) Space conversion and unflattening Environment registration Compatibility patterns Performance considerations Debugging integration Tips for Success

Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments

Profile early: Measure steps per second from the start to identify bottlenecks

Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points

Read references as needed: Each reference file is self-contained and focused on a specific capability

Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed

Leverage vectorization: PufferLib's vectorization is key to achieving high throughput

Monitor training: Use WandB or Neptune to track experiments and identify issues early

Test environments: Validate environment logic before scaling up training

Check existing environments: Ocean suite provides 20+ pre-built environments

Use proper initialization: Always use layer_init from pufferlib.pytorch for policies

Common Use Cases Training on Standard Benchmarks

Atari

env = pufferlib.make('atari-pong', num_envs=256)

Procgen

env = pufferlib.make('procgen-coinrun', num_envs=256)

Minigrid

env = pufferlib.make('minigrid-empty-8x8', num_envs=256)

Multi-Agent Learning

PettingZoo

env = pufferlib.make('pettingzoo-pistonball', num_envs=128)

Shared policy for all agents

policy = create_policy(env.observation_space, env.action_space) trainer = PuffeRL(env=env, policy=policy)

Custom Task Development

Create custom environment

class MyTask(PufferEnv): # ... implement environment ...

Vectorize and train

env = pufferlib.emulate(MyTask, num_envs=256) trainer = PuffeRL(env=env, policy=my_policy)

High-Performance Optimization

Maximize throughput

env = pufferlib.make( 'my-env', num_envs=1024, # Large batch num_workers=16, # Many workers envs_per_worker=64 # Optimize per worker )

Installation uv pip install pufferlib

Documentation Official docs: https://puffer.ai/docs.html GitHub: https://github.com/PufferAI/PufferLib Discord: Community support available

安装