vit_tiny_patch16_224

Converted TIMM image classification model for LiteRT.

Source architecture: vit_tiny_patch16_224
File: model.tflite

Model Details

Model Type: Image classification / feature backbone
Model Stats:
- Params (M): 5.7
- GMACs: 1.1
- Activations (M): 4.1
- Image size: 224 x 224
Papers:
- How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers: https://arxiv.org/abs/2106.10270
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
Dataset: ImageNet-1k
Pretrain Dataset: ImageNet-21k
Original: https://github.com/google-research/vision_transformer

Citation

@article{steiner2021augreg,
  title={How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers},
  author={Steiner, Andreas and Kolesnikov, Alexander and and Zhai, Xiaohua and Wightman, Ross and Uszkoreit, Jakob and Beyer, Lucas},
  journal={arXiv preprint arXiv:2106.10270},
  year={2021}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}

Downloads last month: -

Model tree for litert-community/vit_tiny_patch16_224

Base model

timm/vit_tiny_patch16_224.augreg_in21k_ft_in1k

Finetuned

(1)

this model

Dataset used to train litert-community/vit_tiny_patch16_224

Papers for litert-community/vit_tiny_patch16_224

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Paper • 2106.10270 • Published Jun 18, 2021 • 3

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 15