OSDFace — Pretrained Weights (Mirror)

This is an unofficial mirror. All credit goes to the original authors. The weights are mirrored here from the official OSDFace repository for convenience, as the original download is hosted on OneDrive/Google Drive which can be slow or inaccessible in some regions. Please cite the original paper and star the original repo if you use these weights.

Overview

OSDFace (One-Step Diffusion Model for Face Restoration) is a single-step diffusion model that restores degraded, low-quality face images into high-fidelity, identity-consistent outputs. It was accepted at CVPR 2025.

Unlike multi-step diffusion approaches, OSDFace requires only one forward pass through a modified Stable Diffusion 2.1 UNet, making it significantly faster at inference while achieving state-of-the-art results on both synthetic (CelebA-Test) and real-world (Wider-Test, LFW-Test, WebPhoto-Test) benchmarks.

The key innovations are:

Visual Representation Embedder (VRE): A VQ-VAE encoder that tokenizes the low-quality input face and produces visual prompt embeddings via a vector-quantized dictionary. These embeddings replace the text encoder's output and are fed directly into the UNet's cross-attention layers.
Facial Identity Loss: A face-recognition-derived loss that enforces identity consistency between the restored and ground-truth faces.
GAN Guidance: A generative adversarial network guides the one-step diffusion to align the output distribution with the ground truth.

Usage

Prerequisites

Base model: stabilityai/stable-diffusion-2-1-base
Python 3.10, PyTorch 2.4.0, diffusers 0.27.2

Quick Start

# Clone the official repo
git clone https://github.com/jkwang28/OSDFace.git
cd OSDFace

# Download these weights into pretrained/
# Place: associate_2.ckpt, embedding_change_weights.pth, pytorch_lora_weights.safetensors

# Run inference (with LoRA merging for speed)
python infer.py \
    --input_image data/WebPhoto-Test \
    --output_dir results/WebPhoto-Test \
    --pretrained_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \
    --img_encoder_weight "pretrained/associate_2.ckpt" \
    --ckpt_path pretrained \
    --merge_lora \
    --mixed_precision fp16 \
    --gpu_ids 0

Note on the different pretrained model Although the project is based on stabilityai/stable-diffusion-2-1-base we use Manojb/stable-diffusion-2-1-base because the former can't be downloaded from huggingface.

Files in This Repository

`associate_2.ckpt` (1.87 GB)

The VQ-VAE image encoder (referred to as the Visual Representation Embedder in the paper). This is the core component that understands the degraded input face.

It contains a multi-head encoder with downsampling blocks, a mid-block with attention, and a vector quantizer with a learned 1024-entry codebook (embedding dim 512). At inference, the encoder processes a 512×512 low-quality face, extracts spatial features, quantizes them against the codebook, and selects the 77 closest (non-duplicate) codebook entries — producing a (batch, 77, 512) tensor that acts as a drop-in replacement for CLIP text embeddings in the UNet's cross-attention.

Loaded via: --img_encoder_weight associate_2.ckpt

`embedding_change_weights.pth` (1.58 MB)

A lightweight embedding projection module (TwoLayerConv1x1) that maps the VRE output from 512 dimensions to 1024 dimensions, matching the hidden size expected by Stable Diffusion 2.1's UNet cross-attention layers.

Architecture: two 1×1 Conv1d layers with SiLU activations (512 → 256 → 1024), operating over the 77-token sequence.

This module is used in the default configuration (without --cat_prompt_embedding). When --cat_prompt_embedding is enabled, the VRE instead outputs 154 tokens at 512-dim which are reshaped to 77 tokens at 1024-dim, bypassing this module entirely.

Loaded from: <ckpt_path>/embedding_change_weights.pth

`pytorch_lora_weights.safetensors` (67.9 MB)

LoRA (Low-Rank Adaptation) weights for the Stable Diffusion 2.1 UNet. These adapt the frozen SD2.1 UNet to perform one-step face restoration conditioned on the VRE embeddings.

Default LoRA configuration: rank 16, alpha 16 (effective scaling factor alpha/rank = 1.0). The weights cover both standard LoRA layers (lora_A/lora_B) and some additional lora.up/lora.down layers.

These can be loaded in two ways:

Dynamic loading (default): loaded at runtime via diffusers' load_lora_weights()
Merged loading (--merge_lora): pre-merged into the UNet weights before inference for slightly faster execution

Loaded from: <ckpt_path>/pytorch_lora_weights.safetensors

Key Inference Arguments

Argument	Default	Description
`--merge_lora`	off	Merge LoRA into UNet weights (recommended)
`--mixed_precision`	`fp32`	Use `fp16` for faster inference / lower VRAM
`--gpu_ids`	`[0]`	Multi-GPU support, e.g. `--gpu_ids 0 1 2 3`
`--cat_prompt_embedding`	off	Alternative embedding strategy (skips embedding_change module)
`--lora_rank`	16	LoRA rank (must match training)
`--lora_alpha`	16	LoRA alpha (must match training)

Inference Pipeline (Summary)

Input image resized to 512×512
VRE encodes the LQ face → (B, 77, 512) visual prompt
Embedding projection maps to (B, 77, 1024) (or concatenation path)
VAE encodes the LQ face to latent space
UNet performs a single denoising step at timestep 399, conditioned on the visual prompt
Predicted clean latent is decoded by the VAE → restored face

Citation

@InProceedings{wang2025osdface,
    author    = {Wang, Jingkai and Gong, Jue and Zhang, Lin and Chen, Zheng and Liu, Xing and Gu, Hong and Liu, Yutong and Zhang, Yulun and Yang, Xiaokang},
    title     = {{OSDFace}: One-Step Diffusion Model for Face Restoration},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {12626-12636}
}

Model tree for alecccdd/OSDFace

Base model

stabilityai/stable-diffusion-2-1-base

Adapter

(711)

this model

Paper for alecccdd/OSDFace

OSDFace: One-Step Diffusion Model for Face Restoration

Paper • 2411.17163 • Published Nov 26, 2024 • 1

alecccdd
/

OSDFace

OSDFace — Pretrained Weights (Mirror)

Overview

Usage

Prerequisites

Quick Start

Files in This Repository

`associate_2.ckpt` (1.87 GB)

`embedding_change_weights.pth` (1.58 MB)

`pytorch_lora_weights.safetensors` (67.9 MB)

Key Inference Arguments

Inference Pipeline (Summary)

Citation

Links

Model tree for alecccdd/OSDFace

Paper for alecccdd/OSDFace

OSDFace: One-Step Diffusion Model for Face Restoration

OSDFace — Pretrained Weights (Mirror)

Overview

Usage

Prerequisites

Quick Start

Files in This Repository

associate_2.ckpt (1.87 GB)

embedding_change_weights.pth (1.58 MB)

pytorch_lora_weights.safetensors (67.9 MB)

Key Inference Arguments

Inference Pipeline (Summary)

Citation

Links

Model tree for alecccdd/OSDFace

Paper for alecccdd/OSDFace

`associate_2.ckpt` (1.87 GB)

`embedding_change_weights.pth` (1.58 MB)

`pytorch_lora_weights.safetensors` (67.9 MB)