LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published 5 days ago • 16
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published 5 days ago • 16
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper • 2604.27393 • Published 14 days ago • 65
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 49
Adding Conditional Control to Text-to-Image Diffusion Models Paper • 2302.05543 • Published Feb 10, 2023 • 58