SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 124
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11, 2025 • 50