Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models Paper • 2603.13985 • Published 6 days ago • 9
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 443