Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models
Paper β’ 2510.18457 β’ Published β’ 3
Pretrained checkpoints, features, and samples for VFM-VAE, introduced in the paper:
Tianci Bi et al., "VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models", CVPR 2026 Β· arXiv:2510.18457
π Accepted to CVPR 2026.
@inproceedings{bi2026vfmvae,
title = {Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models},
author = {Bi, Tianci and Zhang, Xiaoyi and Lu, Yan and Zheng, Nanning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}