VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 14 days ago • 167
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 16 days ago • 175
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 14 days ago • 228
WTF GENIUS PAPERS Collection Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models. • 67 items • Updated 8 days ago • 8
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 2 days ago • 2
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation Paper • 2512.20908 • Published Dec 24, 2025 • 29
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 20 days ago • 333
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents Paper • 2601.03236 • Published Jan 6 • 7
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 228
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 27 days ago • 156
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 2 days ago • 2