Rank-GRPO: Training LLM-based Conversational Recommender Systems with
Reinforcement Learning
Paper
• 2510.20150
• Published
• 6
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
• 2511.06221
• Published
• 132
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
• 2508.10433
• Published
• 144
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published
• 105
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
• 2511.22570
• Published
• 90
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper
• 2512.24138
• Published
• 29
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
• 2601.07348
• Published
• 114
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published
• 40
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
Paper
• 2601.14243
• Published
• 22
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
Paper
• 2601.16973
• Published
• 40
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
Paper
• 2601.11258
• Published
• 9
RL's Razor: Why Online Reinforcement Learning Forgets Less
Paper
• 2509.04259
• Published
• 6
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published
• 139