🧩 VFM-VAE

Pretrained checkpoints, features, and samples for VFM-VAE, introduced in the paper:

Tianci Bi et al., "VFM-VAE: Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models", CVPR 2026 Β· arXiv:2510.18457

πŸŽ‰ Accepted to CVPR 2026.

  • πŸ’» Code: github.com/tianciB/VFM-VAE
  • πŸ“¦ Includes: alignment data, ImageNet-256 & ImageNet-512 checkpoints, and diffusion samples
  • πŸͺͺ License: CC-BY-NC-4.0 Β© 2025 Tianci Bi, Xi'an Jiaotong University

πŸ“ Citation

@inproceedings{bi2026vfmvae,
  title     = {Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models},
  author    = {Bi, Tianci and Zhang, Xiaoyi and Lu, Yan and Zheng, Nanning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},                                                                                   
  year      = {2026}                                                                                                                                                                        
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for tiancibi/VFM-VAE