AlayaRenderer — Generative World Renderer Skill by ara.so — Daily 2026 Skills collection. AlayaRenderer is a two-stage framework for high-quality video rendering: Inverse Renderer (RGB → G-buffers): Extracts albedo, normal, depth, roughness, and metallic maps from RGB video using a fine-tuned Cosmos-Transfer1-DiffusionRenderer 7B model. Game Editing (G-buffers + Text → Stylized RGB): Synthesizes photorealistic, stylized RGB video from G-buffer inputs using a fine-tuned Wan2.1 1.3B model via DiffSynth-Studio. Installation Clone the Repository git clone --recurse-submodules https://github.com/ShandaAI/AlayaRenderer.git cd AlayaRenderer Important: Use --recurse-submodules — DiffSynth-Studio is a git submodule required for Game Editing. Two Separate Conda Environments (Recommended) The two models have conflicting dependencies. Use separate environments:
Environment 1: Inverse Renderer
conda create -n inverse_renderer python = 3.10 -y conda activate inverse_renderer cd inverse_renderer
Follow inverse_renderer/ instructions for Cosmos-Transfer1 setup
Environment 2: Game Editing
conda create -n game_editing python = 3.10 -y conda activate game_editing cd game_editing
Follow DiffSynth-Studio setup instructions
Model Weights Model Base Model Size HuggingFace Link Inverse Renderer Cosmos-Transfer1-DiffusionRenderer 7B ~7B params Brian9999/world_inverse_renderer Game Editing Wan2.1 1.3B ~1.3B params Brian9999/stylerenderer Download and Place Weights
Inverse Renderer — replace the base checkpoint
huggingface-cli download Brian9999/world_inverse_renderer \ --local-dir inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B
Game Editing — place in game_editing models directory
mkdir -p game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer huggingface-cli download Brian9999/stylerenderer \ --local-dir game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer Inverse Renderer Usage The inverse renderer decomposes an RGB video into 5 G-buffer channels: albedo, normal, depth, roughness, metallic . Setup cd inverse_renderer
Follow Cosmos-Transfer1-DiffusionRenderer environment setup
Ensure checkpoint is at:
inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/
Inference Refer to the inverse_renderer/ subdirectory for the full inference script. The general pattern follows Cosmos-Transfer1-DiffusionRenderer conventions:
inverse_renderer/run_inverse.py (typical pattern)
import torch from pathlib import Path
Input: path to RGB video
input_video
"path/to/rgb_video.mp4" output_dir = "outputs/gbuffers/"
The model outputs 5 synchronized channels:
- albedo (diffuse color)
- normal (surface orientation)
- depth (scene geometry)
- roughness (surface roughness)
- metallic (metallic property)
Game Editing Usage Quick Start — CLI Inference cd game_editing CUDA_VISIBLE_DEVICES = 0 python \ examples/wanvideo/model_inference/inference_gbuffer_caption.py \ --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \ --gpu 0 \ --style snowy_winter \ --prompt "the scene is set in a frozen, snow-covered environment under cold, pale winter light with falling snowflakes, creating a silent and ethereal winter wonderland atmosphere." \ --gbuffer_dir test_dataset \ --save_dir outputs/ \ --num_frames 81 \ --height 480 \ --width 832 CLI Parameters Parameter Description Example --checkpoint Path to fine-tuned .safetensors weights models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors --gpu GPU device index 0 --style Named style preset snowy_winter , rainy , night , sunset --prompt Text description of target lighting/atmosphere See examples below --gbuffer_dir Directory containing G-buffer input frames/video test_dataset --save_dir Output directory for rendered video outputs/ --num_frames Number of frames to generate (must be 8n+1 ) 81 --height Output height in pixels 480 --width Output width in pixels 832 G-buffer Directory Structure test_dataset/ ├── albedo/ │ ├── frame_0000.png │ ├── frame_0001.png │ └── ... ├── normal/ │ ├── frame_0000.png │ └── ... ├── depth/ │ ├── frame_0000.png │ └── ... ├── roughness/ │ ├── frame_0000.png │ └── ... └── metallic/ ├── frame_0000.png └── ... Style Prompt Examples
Cyberpunk night scene
--style night \ --prompt "neon-lit urban environment at night with rain-slicked streets reflecting colorful neon signs, creating a cyberpunk noir atmosphere"
Golden hour / sunset
--style sunset \ --prompt "warm golden hour lighting with long shadows and a glowing amber sky, soft cinematic atmosphere"
Rainy urban
--style rainy \ --prompt "overcast rainy day with wet surfaces, soft diffuse lighting, and atmospheric fog creating a moody cinematic look"
Fantasy / stylized
--style fantasy \ --prompt "magical forest environment with bioluminescent plants, ethereal blue-green lighting, and mystical particle effects"
Foggy morning
--style foggy \ --prompt "early morning dense fog with soft diffused light creating a mysterious and quiet atmosphere" Multi-GPU Inference
Run on specific GPU
CUDA_VISIBLE_DEVICES
1 python \ examples/wanvideo/model_inference/inference_gbuffer_caption.py \ --checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \ --gpu 1 \ --style rainy \ --prompt "heavy rainfall with dark storm clouds and dramatic lightning in the distance" \ --gbuffer_dir my_gbuffers \ --save_dir outputs/rainy_scene \ --num_frames 81 --height 480 --width 832 Full Pipeline: RGB Video → Stylized Output
Step 1: Extract G-buffers from RGB video (Inverse Renderer env)
conda activate inverse_renderer cd inverse_renderer python run_inverse.py \ --input path/to/gameplay_video.mp4 \ --output_dir .. /game_editing/test_dataset/
Step 2: Apply game editing style (Game Editing env)
- conda activate game_editing
- cd
- ..
- /game_editing
- CUDA_VISIBLE_DEVICES
- =
- 0
- python
- \
- examples/wanvideo/model_inference/inference_gbuffer_caption.py
- \
- --checkpoint
- models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors
- \
- --gpu
- 0
- \
- --style
- snowy_winter
- \
- --prompt
- "frozen tundra with blizzard conditions, pale blue-white lighting and drifting snow"
- \
- --gbuffer_dir
- test_dataset
- \
- --save_dir
- outputs/final_render
- \
- --num_frames
- 81
- --height
- 480
- --width
- 832
- Online Demos
- Demo
- URL
- Game Editing Demo
- https://huggingface.co/spaces/Brian9999/game-editing
- Project Page
- https://alaya-studio.github.io/renderer/
- Dataset Overview
- The AlayaRenderer dataset (release pending) features:
- 4M+ frames
- at 720p / 30 FPS
- 6 synchronized channels
- RGB + albedo, normal, depth, metallic, roughness 40 hours from Cyberpunk 2077 and Black Myth: Wukong Average clip length: 8 minutes , up to 53 minutes continuous Weather variants: sunny, rainy, foggy, night, sunset Motion blur variant via sub-frame interpolation Architecture Summary RGB Video Input │ ▼ ┌─────────────────────────────────────┐ │ Inverse Renderer │ │ (Cosmos-Transfer1 7B fine-tuned) │ │ RGB → [albedo, normal, depth, │ │ roughness, metallic] │ └─────────────────┬───────────────────┘ │ G-buffers ▼ ┌─────────────────────────────────────┐ │ Game Editing │ │ (Wan2.1 1.3B fine-tuned) │ │ G-buffers + Text Prompt │ │ → Stylized RGB Video │ └─────────────────────────────────────┘ Troubleshooting Submodule not found / DiffSynth-Studio missing
If cloned without --recurse-submodules:
git submodule update --init --recursive CUDA Out of Memory Reduce --num_frames (try 41 instead of 81 ) Reduce resolution: --height 320 --width 576 Ensure no other processes are using the GPU: CUDA_VISIBLE_DEVICES=0 num_frames must follow 8n+1 pattern Valid values: 9, 17, 25, 33, 41, 49, 57, 65, 73, 81
Valid
--num_frames 81
8*10 + 1 ✓
--num_frames 41
8*5 + 1 ✓
Invalid
--num_frames 80
✗
--num_frames 60
✗
Checkpoint not found
Verify checkpoint placement
ls game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors ls inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/ Version conflicts between models Always use the two separate conda environments ( inverse_renderer and game_editing ). Do not install both models' dependencies in one environment. Citation @article{huang2026generativeworldrenderer, title={Generative World Renderer}, author={Zheng-Hui Huang and Zhixiang Wang and Jiaming Tan and Ruihan Yu and Yidan Zhang and Bo Zheng and Yu-Lun Liu and Yung-Yu Chuang and Kaipeng Zhang}, journal={arXiv preprint arXiv:2604.02329}, year={2026} }