Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation Paper • 2603.18795 • Published 7 days ago • 12
Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation Paper • 2603.18795 • Published 7 days ago • 12
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14, 2025 • 112
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published Feb 24, 2025 • 14 • 3
LRM: Large Reconstruction Model for Single Image to 3D Paper • 2311.04400 • Published Nov 8, 2023 • 52