Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published 11 days ago • 67
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering Paper • 2603.18558 • Published 6 days ago • 10