llm-training

安装量: 34
排名: #19931

安装

npx skills add https://github.com/eyadsibai/ltk --skill llm-training
LLM Training
Frameworks and techniques for training and finetuning large language models.
Framework Comparison
Framework
Best For
Multi-GPU
Memory Efficient
Accelerate
Simple distributed
Yes
Basic
DeepSpeed
Large models, ZeRO
Yes
Excellent
PyTorch Lightning
Clean training loops
Yes
Good
Ray Train
Scalable, multi-node
Yes
Good
TRL
RLHF, reward modeling
Yes
Good
Unsloth
Fast LoRA finetuning
Limited
Excellent
Accelerate (HuggingFace)
Minimal wrapper for distributed training. Run
accelerate config
for interactive setup.
Key concept
Wrap model, optimizer, dataloader with
accelerator.prepare()
, use
accelerator.backward()
for loss.
DeepSpeed (Large Models)
Microsoft's optimization library for training massive models.
ZeRO Stages:
Stage 1
Optimizer states partitioned across GPUs
Stage 2
  • Gradients partitioned
    Stage 3
    • Parameters partitioned (for largest models, 100B+)
      Key concept
      Configure via JSON, higher stages = more memory savings but more communication overhead.
      TRL (RLHF/DPO)
      HuggingFace library for reinforcement learning from human feedback.
      Training types:
      SFT (Supervised Finetuning)
      Standard instruction tuning
      DPO (Direct Preference Optimization)
      Simpler than RLHF, uses preference pairs
      PPO
      Classic RLHF with reward model
      Key concept
      DPO is often preferred over PPO - simpler, no reward model needed, just chosen/rejected response pairs.
      Unsloth (Fast LoRA)
      Optimized LoRA finetuning - 2x faster, 60% less memory.
      Key concept
      Drop-in replacement for standard LoRA with automatic optimizations. Best for 7B-13B models. Memory Optimization Techniques Technique Memory Savings Trade-off Gradient checkpointing ~30-50% Slower training Mixed precision (fp16/bf16) ~50% Minor precision loss 4-bit quantization (QLoRA) ~75% Some quality loss Flash Attention ~20-40% Requires compatible GPU Gradient accumulation Effective batch↑ No memory cost Decision Guide Scenario Recommendation Simple finetuning Accelerate + PEFT 7B-13B models Unsloth (fastest) 70B+ models DeepSpeed ZeRO-3 RLHF/DPO alignment TRL Multi-node cluster Ray Train Clean code structure PyTorch Lightning Resources Accelerate: https://huggingface.co/docs/accelerate DeepSpeed: https://www.deepspeed.ai/ TRL: https://huggingface.co/docs/trl Unsloth: https://github.com/unslothai/unsloth
返回排行榜