Multi-agent cooperation through in-context co-player inference Paper • 2602.16301 • Published 3 days ago • 13
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Paper • 2602.16705 • Published 3 days ago • 26
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models Paper • 2602.15772 • Published 4 days ago • 6
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 8 days ago • 48
Revisiting the Platonic Representation Hypothesis: An Aristotelian View Paper • 2602.14486 • Published 5 days ago • 9
Visual Persuasion: What Influences Decisions of Vision-Language Models? Paper • 2602.15278 • Published 5 days ago • 3
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Paper • 2602.14299 • Published 6 days ago • 24
Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings Paper • 2602.13823 • Published 7 days ago • 9
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published 6 days ago • 42
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 11 days ago • 219
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published 9 days ago • 14
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published 22 days ago • 192
Quantifying the Gap between Understanding and Generation within Unified Multimodal Models Paper • 2602.02140 • Published 19 days ago • 12
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published Dec 10, 2025 • 72
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published Dec 23, 2025 • 16
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 119