Running Featured 39 QED-Nano: Teaching a Tiny Model to Prove Hard Theorems 📝 39 Who needs 1T parameters? Olympiad proofs with a 4B model
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 20 days ago • 41