AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents Paper • 2603.14465 • Published 6 days ago • 22
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment Paper • 2402.19085 • Published Feb 29, 2024
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning Paper • 2506.07851 • Published Jun 9, 2025
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published Jan 29 • 12
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published Jan 29 • 12