library_name: diffusers
license: cc-by-nc-4.0
pipeline_tag: image-to-image
tags:
- normal-estimation
- depth-estimation
- diffusion
- transparent-objects
TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
This is the official repository for the paper TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation.
Authors: Mingwei Li, Hehe Fan, Yi Yang
TransNormal is a novel framework that adapts pre-trained diffusion priors for single-step normal regression for transparent objects. It addresses challenges like complex light refraction and reflection by integrating dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues for textureless transparent surfaces. The framework also employs a multi-task learning objective and wavelet-based regularization to preserve fine-grained structural details.
Usage
To use this model, you need to set up the DINOv3 encoder separately (as it requires access approval from Meta AI).
from transnormal import TransNormalPipeline, create_dino_encoder
import torch
# Create DINO encoder
# Note: Use bfloat16 instead of float16 to avoid potential issues with DINOv3
dino_encoder = create_dino_encoder(
model_name="dinov3_vith16plus",
weights_path="path/to/dinov3_vith16plus", # Path to approved DINOv3 weights
projector_path="./weights/transnormal/cross_attention_projector.pt",
device="cuda",
dtype=torch.bfloat16,
)
# Load TransNormal pipeline
pipe = TransNormalPipeline.from_pretrained(
"longxiang-ai/transnormal-v1",
dino_encoder=dino_encoder,
torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")
# Run inference
normal_map = pipe(
image="path/to/image.jpg",
output_type="pil", # Choose from "np", "pil", or "pt"
)
# Save the result
from transnormal import save_normal_map
save_normal_map(normal_map, "output_normal.png")
Citation
If you find our work useful, please consider citing:
@misc{li2026transnormal,
title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
author={Mingwei Li and Hehe Fan and Yi Yang},
year={2026},
eprint={2602.00839},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.00839},
}
License
This project is licensed under CC BY-NC 4.0.
Acknowledgements
This work builds upon:
- Lotus - Diffusion-based depth and normal estimation
- DINOv3 - Self-supervised vision transformer from Meta AI
- Stable Diffusion 2 - Base diffusion model