RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published 8 days ago • 12
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 7 days ago • 10
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning Paper • 2603.10160 • Published 7 days ago • 24
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published 13 days ago • 195
Automatic Generation of High-Performance RL Environments Paper • 2603.12145 • Published 5 days ago • 6
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training Paper • 2603.10444 • Published 7 days ago • 10
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges Paper • 2603.11863 • Published 5 days ago • 5
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published 7 days ago • 8
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training Paper • 2603.12246 • Published 5 days ago • 4
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Paper • 2603.05863 • Published 12 days ago • 5
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness Paper • 2603.09200 • Published 8 days ago • 5
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs Paper • 2603.09906 • Published 7 days ago • 67
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer Paper • 2603.03583 • Published 14 days ago • 2
Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models Paper • 2603.07777 • Published 9 days ago • 5
Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems Paper • 2603.07779 • Published 9 days ago • 5
Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training Paper • 2603.07223 • Published 10 days ago • 13
NLE: Non-autoregressive LLM-based ASR by Transcript Editing Paper • 2603.08397 • Published 8 days ago • 21