TransNormal / README.md
nielsr's picture
nielsr HF Staff
Update model card with paper, project, and code links
0a77db3 verified
|
raw
history blame
3.02 kB
metadata
library_name: diffusers
license: cc-by-nc-4.0
pipeline_tag: image-to-image
tags:
  - normal-estimation
  - depth-estimation
  - diffusion
  - transparent-objects

TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

This is the official repository for the paper TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation.

Project Page | GitHub

Authors: Mingwei Li, Hehe Fan, Yi Yang

TransNormal is a novel framework that adapts pre-trained diffusion priors for single-step normal regression for transparent objects. It addresses challenges like complex light refraction and reflection by integrating dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues for textureless transparent surfaces. The framework also employs a multi-task learning objective and wavelet-based regularization to preserve fine-grained structural details.

Usage

To use this model, you need to set up the DINOv3 encoder separately (as it requires access approval from Meta AI).

from transnormal import TransNormalPipeline, create_dino_encoder
import torch

# Create DINO encoder
# Note: Use bfloat16 instead of float16 to avoid potential issues with DINOv3
dino_encoder = create_dino_encoder(
    model_name="dinov3_vith16plus",
    weights_path="path/to/dinov3_vith16plus", # Path to approved DINOv3 weights
    projector_path="./weights/transnormal/cross_attention_projector.pt",
    device="cuda",
    dtype=torch.bfloat16,
)

# Load TransNormal pipeline
pipe = TransNormalPipeline.from_pretrained(
    "longxiang-ai/transnormal-v1",
    dino_encoder=dino_encoder,
    torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")

# Run inference
normal_map = pipe(
    image="path/to/image.jpg",
    output_type="pil",  # Choose from "np", "pil", or "pt"
)

# Save the result
from transnormal import save_normal_map
save_normal_map(normal_map, "output_normal.png")

Citation

If you find our work useful, please consider citing:

@misc{li2026transnormal,
      title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation}, 
      author={Mingwei Li and Hehe Fan and Yi Yang},
      year={2026},
      eprint={2602.00839},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.00839}, 
}

License

This project is licensed under CC BY-NC 4.0.

Acknowledgements

This work builds upon:

  • Lotus - Diffusion-based depth and normal estimation
  • DINOv3 - Self-supervised vision transformer from Meta AI
  • Stable Diffusion 2 - Base diffusion model