From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
Paper β’ 2602.22859 β’ Published β’ 151
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen2.5-VL-7B-Instruct_DPE_v2 is the second-iteration model evolved through the DPE framework, building upon the foundations of v1.
DPE uses a multi-agent system to diagnose failure patterns and generate high-quality targeted data. This version represents the second evolutionary cycle.
| Category | Benchmark | Base Model | DPE_v2 (Ours) | Improvement |
|---|---|---|---|---|
| STEM | MMMU | 53.11 | 55.33 | +2.22 |
| RealWorldQA | 68.63 | 69.54 | +0.91 | |
| Visual Math | MathVista | 65.50 | 68.20 | +2.70 |
| Specialized | HallusionBench | 64.98 | 69.19 | +4.21 |
| Overall | Average | 57.29 | 58.68 | +1.39 |
@misc{jia2026blindspotsgainsdiagnosticdriven,
title={From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models},
author={Hongrui Jia and Chaoya Jiang and Shikun Zhang and Wei Ye},
year={2026},
eprint={2602.22859},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.22859},
}
This model follows the Qwen Research License.