V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts Paper • 2603.10848 • Published 22 days ago • 14
Flash-KMeans: Fast and Memory-Efficient Exact K-Means Paper • 2603.09229 • Published 24 days ago • 82
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 190
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published Feb 10, 2025 • 22
RAT: Bridging RNN Efficiency and Attention Accuracy in Language Modeling Paper • 2507.04416 • Published Jul 6, 2025 • 1
RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference Paper • 2602.18196 • Published Feb 20 • 1
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 23 days ago • 11
Test-Time Training with KV Binding Is Secretly Linear Attention Paper • 2602.21204 • Published Feb 24 • 30
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking Paper • 2602.21196 • Published Feb 24 • 6
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy Paper • 2602.17363 • Published Feb 19 • 8
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published Feb 13 • 35
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 10