TextPecker-8B-InternVL3
TextPecker-8B-InternVL3 is an evaluator model presented in the paper TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering.
While standard Multimodal LLMs often fail to notice fine-grained text errors like distortion or misalignment in generated images, TextPecker is specifically designed to perceive and quantify these structural anomalies to provide reliable reward signals for RL-based optimization of text-to-image models.
This checkpoint is based on the InternVL3-8B-Instruct architecture and was trained using the ms-swift framework on the TextPecker-1.5M dataset.
Model Details
- Developed by: Hanshen Zhu, Yuliang Liu, et al. (Huazhong University of Science & Technology and ByteDance)
- Model Type: Multimodal Large Language Model (MLLM)
- Base Model: OpenGVLab/InternVL3-8B-Instruct
- Task: Image-to-Text (Structural Anomaly Perception / OCR Evaluator)
- License: Apache 2.0
Model Sources
- Repository: https://github.com/CIawevy/TextPecker
- Paper: https://huggingface.co/papers/2602.20903
- Dataset: CIawevy/TextPecker-1.5M
Uses
TextPecker can be used to evaluate text structural quality and semantic consistency for text generation or editing scenarios. It helps bridge the gap in Visual Text Rendering (VTR) optimization by providing reliable feedback on character-level structural fidelity.
To use the model for deployment or evaluation, please follow the instructions in the official repository:
Citation
If you find TextPecker useful in your research, please cite:
@article{zhu2026TextPecker,
title = {TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering},
author = {Zhu, Hanshen and Liu, Yuliang and Wu, Xuecheng and Wang, An-Lan and Feng, Hao and Yang, Dingkang and Feng, Chao and Huang, Can and Tang, Jingqun and Bai, Xiang},
journal = {arXiv preprint arXiv:2602.20903},
year = {2026}
}
Acknowledgement
Training was conducted using the ms-swift framework. We thank the authors of InternVL and ms-swift for their excellent open-source contributions.
- Downloads last month
- 25
Model tree for CIawevy/TextPecker-8B-InternVL3
Base model
OpenGVLab/InternVL3-8B-Pretrained