Qwen/Qwen3-VL-30B-A3B-Instruct Image-Text-to-Text β’ 31B β’ Updated Nov 26, 2025 β’ 1.5M β’ β’ 533
Running Featured 560 Vision Arena (Testing VLMs side-by-side) πΌ 560 Analyze images with multiple vision models for labels and boxes
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper β’ 2511.06221 β’ Published Nov 9, 2025 β’ 133
Running on CPU Upgrade Featured 2.99k The Smol Training Playbook π 2.99k The secrets to building world-class LLMs
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 β’ 7 items β’ Updated Dec 31, 2025 β’ 164
yayayaaa/florence-2-large-ft-moredetailed Image-to-Text β’ 0.8B β’ Updated Dec 13, 2025 β’ 100 β’ 16
Runtime error Featured 515 Florence2 + SAM2 π₯ 515 Segment and caption objects in images and videos