🧩 VFM-VAE

Pretrained checkpoints, features, and samples for VFM-VAE, introduced in the paper:

Tianci Bi et al., "VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models", CVPR 2026 · arXiv:2510.18457

🎉 Accepted to CVPR 2026.

💻 Code: github.com/tianciB/VFM-VAE
📦 Includes: alignment data, ImageNet-256 & ImageNet-512 checkpoints, and diffusion samples
🪪 License: CC-BY-NC-4.0 © 2025 Tianci Bi, Xi'an Jiaotong University

📝 Citation

@inproceedings{bi2026vfmvae,
  title     = {Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models},
  author    = {Bi, Tianci and Zhang, Xiaoyi and Lu, Yan and Zheng, Nanning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},                                                                                   
  year      = {2026}                                                                                                                                                                        
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for tiancibi/VFM-VAE

Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models

Paper • 2510.18457 • Published Oct 21, 2025 • 3