facebook/metaclip-h14-fullcc2.5b Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 22k • 49
openai/clip-vit-large-patch14 Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 7.88M • 1.96k
Running on CPU Upgrade 13.8k Open LLM Leaderboard 🏆 13.8k Track, rank and evaluate open LLMs and chatbots
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13, 2025 • 17
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training Paper • 2503.08525 • Published Mar 11, 2025 • 17
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12, 2025 • 38
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8, 2025 • 32
Interactive Training: Feedback-Driven Neural Network Optimization Paper • 2510.02297 • Published Oct 2, 2025 • 43
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 42
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published Oct 13, 2025 • 32
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification Paper • 2512.16921 • Published Dec 18, 2025 • 8
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation Paper • 2512.19134 • Published Dec 22, 2025 • 32
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 35
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 62
Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes Paper • 2601.02356 • Published Jan 5 • 14
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 47
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models Paper • 2601.03699 • Published Jan 7 • 6
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published Jan 8 • 42
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders Paper • 2601.10332 • Published 29 days ago • 28
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 14 days ago • 150
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published 10 days ago • 27