VLMs - a HeartofSheep Collection

HeartofSheep 's Collections

Image Representation

VLMs

updated Jul 3, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17, 2025 • 32
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 27
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10, 2025 • 138
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8, 2025 • 187
Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11, 2025 • 157
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 98
OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23, 2025 • 78