MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 3 days ago • 41
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published 3 days ago • 36
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs Paper • 2603.09906 • Published 3 days ago • 56
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories Paper • 2603.02049 • Published 11 days ago • 17
VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection Paper • 2603.00912 • Published 12 days ago • 36
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 15 days ago • 196
VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale Paper • 2602.23361 • Published 15 days ago • 14
EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents Paper • 2602.23205 • Published 15 days ago • 11
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning Paper • 2602.21534 • Published 16 days ago • 23
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 17 days ago • 29
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 17 days ago • 94
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published 18 days ago • 55
SimVLA: A Simple VLA Baseline for Robotic Manipulation Paper • 2602.18224 • Published 21 days ago • 5
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction Paper • 2602.20160 • Published 18 days ago • 10
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents Paper • 2602.16855 • Published 26 days ago • 48