-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 11
Hleb Stenin
halaction
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 5 hours ago
Reading List liked
a dataset 4 days ago
trl-lib/DeepMath-103K updated
a collection
4 days ago
Reading List Organizations
None yet