PRIMO R1
Collection
Official release of PRIMO R1, a 7B video MLLM for robotic process reasoning featuring RL-optimized models, SFT/RL datasets, and cross-domain benchmark • 3 items • Updated • 3
PRIMO R1 (Process Reasoning Induced Monitoring) is a 7B video multimodal large language model (MLLM) framework designed for accurate process supervision in long-horizon robotic manipulation. It was introduced in the paper From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation.
Current video MLLMs often function as passive "Observers" that recognize ongoing events rather than evaluating the current state relative to the final task goal. PRIMO R1 transforms these models into active "Critics" by:
PRIMO R1 achieves state-of-the-art performance across several benchmarks:
If you find our work helpful for your research, please consider citing our work.
@misc{liu2026passiveobserveractivecritic,
title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation},
author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu},
year={2026},
eprint={2603.15600},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.15600},
}
Base model
Qwen/Qwen2.5-VL-7B-Instruct