Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training Paper • 2511.07328 • Published 10 days ago • 13
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning Paper • 2604.04746 • Published Apr 8 • 71
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization Paper • 2604.08476 • Published Apr 9 • 8
Small Vision-Language Models are Smart Compressors for Long Video Understanding Paper • 2604.08120 • Published Apr 9 • 20
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published Apr 9 • 49
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration Paper • 2603.24800 • Published Mar 25 • 68
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published Feb 15 • 56
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published Feb 6 • 83
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published Feb 3 • 59
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published Feb 6 • 74
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities Paper • 2602.05281 • Published Feb 5 • 14
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 324
view article Article Diversity Vs Density: A data strategy comparison for fine-tuning VLMs Akhil-Theerthala • Jan 6 • 5
HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering Paper • 2512.14870 • Published Dec 16, 2025 • 15