TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training Paper • 2603.01714 • Published 4 days ago
SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent Paper • 2602.11551 • Published 23 days ago
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO Paper • 2602.06422 • Published 28 days ago • 44
Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains? Paper • 2510.11184 • Published Oct 13, 2025 • 1
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification Paper • 2601.22642 • Published Jan 30 • 9
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32