Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 11 days ago • 49
$ΔL$ Normalization: Rethink Loss Aggregation in RLVR Paper • 2509.07558 • Published Sep 9, 2025 • 7 • 2
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5, 2025 • 140
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding Paper • 2508.02215 • Published Aug 4, 2025 • 12
TriangleMix: A Lossless and Efficient Attention Pattern for Long Context Prefilling Paper • 2507.21526 • Published Jul 29, 2025
LeanK: Learnable K Cache Channel Pruning for Efficient Decoding Paper • 2508.02215 • Published Aug 4, 2025 • 12
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5, 2025 • 140
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models Paper • 2404.01617 • Published Apr 2, 2024 • 8
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely Paper • 2409.14924 • Published Sep 23, 2024 • 2
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Paper • 2505.12929 • Published May 19, 2025 • 3