--- license: mit pipeline_tag: image-classification tags: - vision - vit - image-classification --- # Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers (ICLR 2026) This repository contains the weights for **Jumbo**, a simple and scalable architecture that makes Vision Transformers (ViTs) faster. Jumbo reduces patch token width while increasing global token width through a new "Jumbo" token processed by a shared, wider FFN. - **Paper:** [Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers](https://arxiv.org/abs/2502.15021) - **GitHub Repository:** [https://github.com/antofuller/jumbo](https://github.com/antofuller/jumbo) ## Model Description ViTs are general and accurate, but often slow. Jumbo addresses this by reducing patch token width while adding a wider Jumbo token processed by its own wider FFN. This approach increases model capacity efficiently: the Jumbo FFN processes only a single token for speed, and its parameters are shared across all layers for memory efficiency. Crucially, Jumbo is attention-only and non-hierarchical, maintaining compatibility with plain ViT methods. ## ImageNet-1K Performance The following accuracies were achieved on ImageNet-1K: | Model | Top-1 Accuracy | | :--- | :--- | | Jumbo-pico | 69.156% | | Jumbo-nano | 74.528% | | Jumbo-tiny | 78.366% | | Jumbo-small | 82.558% | | Jumbo-base | 84.954% | ## Usage For installation and running ImageNet-1K evals, attention visualization, and speed measurement, please follow the instructions in the official repository. ### Installation ```bash pip install -r requirements.txt ``` ### Evaluation ```bash python eval_i1k.py --model_path YOUR_PATH/jumbo_small.pth --model_size small ``` ### Measuring Speed ```bash python measure_speed.py --model_size small ``` ### Visualizing Attention Maps ```bash python visualize_attn.py --model_path YOUR_PATH/jumbo_small.pth --model_size small --out_dir YOUR_PATH/attn_maps --num_images 50 ``` ## Citation ```bibtex @article{fuller2025thicker, title={Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers}, author={Fuller, Anthony and Yassin, Yousef and Kyrollos, Daniel G. and Shelhamer, Evan and Green, James R.}, journal={arXiv preprint arXiv:2502.15021}, year={2025} } ```