Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper โข 2602.07026 โข Published Feb 2 โข 140
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper โข 2601.06943 โข Published Jan 11 โข 216 โข 7
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper โข 2601.06943 โข Published Jan 11 โข 216
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper โข 2505.12448 โข Published May 18, 2025 โข 10
Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper โข 2503.22655 โข Published Mar 28, 2025 โข 38
BELLE-2/Belle-whisper-large-v3-turbo-zh Automatic Speech Recognition โข 0.8B โข Updated Dec 16, 2024 โข 739 โข 74
Running on Zero MCP 2.8k Background Removal ๐ 2.8k Remove image backgrounds and get transparent PNGs
Runtime error Agents 13 Dit Document Layout Analysis ๐ 13 Analyze document layout by uploading images
Running Agents 25 Document Layout Analysis ๐ 25 Segment document layouts into text, images, and tables
Running on Zero Agents 7 Document Layout Comparison ๐ฅ 7 Analyze document layout to identify text and elements