TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation Paper • 2603.19039 • Published 5 days ago • 42
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models Paper • 2603.19466 • Published 4 days ago • 30
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals Paper • 2505.21062 • Published May 27, 2025 • 4
Specificity-aware reinforcement learning for fine-grained open-world classification Paper • 2603.03197 • Published 20 days ago • 15
Large Multimodal Models as General In-Context Classifiers Paper • 2602.23229 • Published 26 days ago • 26
How to Take a Memorable Picture? Empowering Users with Actionable Feedback Paper • 2602.21877 • Published 27 days ago • 16
Loomis Painter: Reconstructing the Painting Process Paper • 2511.17344 • Published Nov 21, 2025 • 20 • 2
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024 • 2
Outline-Guided Object Inpainting with Diffusion Models Paper • 2402.16421 • Published Feb 26, 2024 • 2
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published Jun 2, 2025 • 21