OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published 14 days ago • 148
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published 19 days ago • 151
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 24 days ago • 117
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 27 days ago • 102
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 18 days ago • 64
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory Paper • 2603.03269 • Published 27 days ago • 61
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Paper • 2603.06577 • Published 24 days ago • 48
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising Paper • 2603.08703 • Published 21 days ago • 31
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published 21 days ago • 28
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation Paper • 2603.25702 • Published 4 days ago • 4
MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution Paper • 2603.18718 • Published 12 days ago • 8
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition Paper • 2603.13904 • Published 17 days ago • 3
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 5 days ago • 89
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent Paper • 2603.13875 • Published 17 days ago • 34
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents Paper • 2602.02474 • Published Feb 2 • 62
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Paper • 2512.14531 • Published Dec 16, 2025 • 15
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Paper • 2603.24533 • Published 5 days ago • 41
Understanding the Challenges in Iterative Generative Optimization with LLMs Paper • 2603.23994 • Published 6 days ago • 23
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 7 days ago • 130
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 6 days ago • 58
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published 17 days ago • 84
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 7 days ago • 45
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 13 days ago • 106
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck Paper • 2603.08462 • Published 22 days ago • 21