HeartofSheep 's Collections
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
• 2503.12797
• Published • 32
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the
LLM Era
Paper
• 2503.12329
• Published • 27
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
• 2503.10639
• Published • 53
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published • 207
Paper
• 2504.07491
• Published • 138
Perception, Reason, Think, and Plan: A Survey on Large Multimodal
Reasoning Models
Paper
• 2505.04921
• Published • 187
Seed1.5-VL Technical Report
Paper
• 2505.07062
• Published • 157
MMaDA: Multimodal Large Diffusion Language Models
Paper
• 2505.15809
• Published • 98
OmniGen2: Exploration to Advanced Multimodal Generation
Paper
• 2506.18871
• Published • 78