Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model 8 days ago • 25
Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models Jan 6 • 23
The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 47
Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models Dec 15, 2025 • 107
How to Build a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac for Healthcare Oct 28, 2025 • 20
NVIDIA Releases 8 Million Sample Open Dataset and Tooling for OCR, Image Reasoning, Image and Video QA Tasks Oct 28, 2025 • 17
Cosmos Predict 2.5 & Transfer 2.5: Evolving the World Foundation Models for Physical AI Oct 28, 2025 • 21
Nemotron’s Open Secret: Accelerating AI Development with Open Models, Data, and Recipes Oct 22, 2025 • 11
Llama‑Embed‑Nemotron‑8B Text Embedding Model Ranks First on Multilingual MTEB Leaderboard Oct 21, 2025 • 14
Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models Oct 20, 2025 • 20
📢 NVIDIA Releases Nemotron-CC-Math Pre-Training Dataset: A High-Quality, Web-Scale Math Corpus for Pretraining Large Language Models Aug 18, 2025 • 5
NVIDIA Releases Improved Pretraining Dataset: Preserves High Value Math & Code, and Augments with Multi-Lingual Aug 18, 2025 • 4
NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks Aug 11, 2025 • 75
Llama-NeMoRetriever-ColEmbed: Developer-Focused Guide to NVIDIA's State-of-the-Art Text-Image Retrieval Jul 9, 2025 • 4
Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10, 2025 • 22
Submitted by taesiri 28 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos NVIDIA 1
Submitted by Hyunwoo Kim 9 Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch NVIDIA 10 3
Submitted by Ximing Lu 96 Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text NVIDIA 5
Submitted by Alex Chiu 6 FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning NVIDIA 2
Submitted by taesiri 14 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning NVIDIA 2
Submitted by Haocheng Xi 21 Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow NVIDIA 3
Submitted by Chi-Pin Huang 51 Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning NVIDIA 2
Submitted by taesiri 25 OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding NVIDIA 4
Submitted by LIU Shih-yang 225 GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization NVIDIA 387 9
Submitted by taesiri 45 NitroGen: An Open Foundation Model for Generalist Gaming Agents NVIDIA 1.79k 3
Submitted by taesiri 13 SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling NVIDIA 4
Submitted by JaesungChoe 16 Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting NVIDIA 3
Submitted by taesiri 37 Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning NVIDIA 411 3
Submitted by Byung-Kwan Lee 27 Masking Teacher and Reinforcing Student for Distilling Vision-Language Models NVIDIA 3
Submitted by Min-Hung Chen 47 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation NVIDIA 2
Submitted by Wei Du 9 Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision NVIDIA 1
Submitted by Ryo Hachiuma 9 Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in NVIDIA 1
Submitted by taesiri 15 Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed NVIDIA 1
Submitted by Wei Ping 34 Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models NVIDIA 1
Submitted by Siyi Chen 23 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL NVIDIA 10 2
Submitted by Shizhe Diao 124 ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration NVIDIA 649 5
Submitted by Yonggan Fu 34 Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models NVIDIA 2
Submitted by Min-Hung Chen 6 VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models NVIDIA 3
Submitted by Yauhen Babakhin 13 Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks NVIDIA 2
Submitted by Huck Yang 7 Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale NVIDIA 2
Submitted by Byung-Kwan Lee 31 Unified Reinforcement and Imitation Learning for Vision-Language Models NVIDIA 7
Submitted by Shizhe Diao 8 ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge NVIDIA 28 2
Submitted by taesiri 91 OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM NVIDIA 635 4
Submitted by Min-Hung Chen 16 DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning NVIDIA 14 3
Submitted by Ankit Goyal 15 VLA-0: Building State-of-the-Art VLAs with Zero Modification NVIDIA 439 3
Submitted by Wei Huang 180 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs NVIDIA 484 5
Submitted by Min-Hung Chen 8 TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control NVIDIA 2
Submitted by Jay Wu 20 ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation NVIDIA 671 2
Submitted by Han Cai 39 DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder NVIDIA 179 2
Submitted by Han Cai 9 DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space NVIDIA 345 2
Submitted by Yuyang 46 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer NVIDIA 4.96k 2
Submitted by Shrimai Prabhumoye 24 Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data NVIDIA 4
Submitted by Zhilin Wang 7 RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards NVIDIA 2
Submitted by Min-Hung Chen 3 V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts NVIDIA 14 4
Submitted by Chi-Pin Huang 41 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning NVIDIA 1
Submitted by Byung-Kwan Lee 41 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models NVIDIA 2
Submitted by Min-Hung Chen 5 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models NVIDIA 11 5
Submitted by Min-Hung Chen 33 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks NVIDIA 2
Submitted by Byung-Kwan Lee 15 VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models NVIDIA 2
Submitted by Pavlo Molchanov 46 Hymba: A Hybrid-head Architecture for Small Language Models NVIDIA 208 3
Submitted by Min-Hung Chen 7 EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation NVIDIA 27 2