Locality-Attending Vision Transformer
Paper
• 2603.04892 • Published
• 6
Pretrain vision transformers so that their patch representations transfer better to dense prediction (e.g., segmentation), without changing the pretraining objective.
import timm
model = timm.create_model("hf_hub:sinahmr/locatvit_tiny", pretrained=True)
@inproceedings{hajimiri2026locatvit,
author = {Hajimiri, Sina and Beizaee, Farzad and Shakeri, Fereshteh and Desrosiers, Christian and Ben Ayed, Ismail and Dolz, Jose},
title = {Locality-Attending Vision Transformer},
booktitle = {International Conference on Learning Representations},
year = {2026}
}