codeformer / DOCUMENTATION.md
sd
Upload 110 files
adf2fff verified

CodeFormer Face Restoration - Project Documentation

1. Introduction

CodeFormer is a robust blind face restoration algorithm designed to restore old, degraded, or AI-generated face images. It utilizes a Codebook Lookup Transformer (VQGAN-based) to predict high-quality facial features even from severe degradation, ensuring that the restored faces look natural and faithful to the original identity.

This project wraps the core CodeFormer research code into a deployable, user-friendly Flask Web Application, containerized with Docker for easy deployment on platforms like Hugging Face Spaces.

Key Features

  • Blind Face Restoration: Restores faces from low-quality inputs without knowing the specific degradation details.
  • Background Enhancement: Uses Real-ESRGAN to upscale and enhance the non-face background regions of the image.
  • Face Alignment & Paste-back: Automatically detects faces, aligns them for processing, and seamlessly blends them back into the original image.
  • Adjustable Fidelity: Users can balance between restoration quality (hallucinating details) and identity fidelity (keeping the original look).

2. System Architecture

The application is built on a Python/PyTorch backend served via Flask.

2.1 Technology Stack

  • Framework: Flask (Python Web Server)
  • Deep Learning: PyTorch, TorchVision
  • Image Processing: OpenCV, NumPy, Pillow
  • Core Libraries: basicsr (Basic Super-Restoration), facelib (Face detection/utils)
  • Frontend: HTML5, Bootstrap 5, Jinja2 Templates
  • Containerization: Docker (CUDA-enabled)

2.2 Directory Structure

CodeFormer/
β”œβ”€β”€ app.py                 # Main Flask application entry point
β”œβ”€β”€ Dockerfile             # Container configuration
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ basicsr/               # Core AI framework (Super-Resolution tools)
β”œβ”€β”€ facelib/               # Face detection and alignment utilities
β”œβ”€β”€ templates/             # HTML Frontend
β”‚   β”œβ”€β”€ index.html         # Upload interface
β”‚   └── result.html        # Results display
β”œβ”€β”€ static/                # Static assets (css, js, uploads)
β”‚   β”œβ”€β”€ uploads/           # Temporary storage for input images
β”‚   └── results/           # Temporary storage for processed output
└── weights/               # Pre-trained model weights (downloaded on startup)
    β”œβ”€β”€ CodeFormer/        # CodeFormer model (.pth)
    β”œβ”€β”€ facelib/           # Detection (RetinaFace) and Parsing models
    └── realesrgan/        # Background upscaler (Real-ESRGAN)

2.3 Logic Flow

  1. Input: User uploads an image via the Web UI.
  2. Pre-processing (app.py):
    • Image is saved to static/uploads.
    • Parameters (fidelity, upscale factor) are parsed.
  3. Inference Pipeline:
    • Detection: facelib detects faces in the image using RetinaFace.
    • Alignment: Faces are cropped and aligned to a standard 512x512 resolution.
    • Restoration: The CodeFormer model processes the aligned faces.
    • Upscaling (Optional): The background is upscaled using Real-ESRGAN.
    • Paste-back: Restored faces are warped back to their original positions and blended.
  4. Output: The final image is saved to static/results and displayed to the user.

3. Installation & Deployment

3.1 Docker Deployment (Recommended)

The project is optimized for Docker.

Prerequisites: Docker, NVIDIA GPU (optional, but recommended).

  1. Build the Image:

    docker build -t codeformer-app .
    
  2. Run the Container:

    # Run on port 7860 (Standard for HF Spaces)
    docker run -it -p 7860:7860 codeformer-app
    

    Note: To use GPU, add the --gpus all flag to the run command.

3.2 Hugging Face Spaces Deployment

This repository is configured for direct deployment to Hugging Face.

  1. Create a Docker Space on Hugging Face.
  2. Push this entire repository to the Space's Git remote.
    git remote add hf git@hf.co:spaces/USERNAME/SPACE_NAME
    git push hf main
    
  3. The Space will build (approx. 5-10 mins) and launch automatically.

3.3 Local Development

  1. Install Environment:
    conda create -n codeformer python=3.8
    conda activate codeformer
    pip install -r requirements.txt
    
  2. Install Basicsr:
    python basicsr/setup.py install
    
  3. Run App:
    python app.py
    

4. User Guide (Web Interface)

4.1 Interface Controls

  • Input Image: Supports standard formats (JPG, PNG, WEBP). Drag and drop supported.
  • Fidelity Weight (w):
    • Range: 0.0 to 1.0.
    • 0.0 (Better Quality): The model "hallucinates" more details. Results look very sharp and high-quality but may slightly alter the person's identity (look less like the original).
    • 1.0 (Better Identity): The model sticks strictly to the original features. Results are faithful to the original photo but might be blurrier or contain more artifacts.
    • Recommended: 0.5 is a balanced default.
  • Upscale Factor:
    • Scales the final output resolution (1x, 2x, or 4x).
    • Note: Higher scaling requires more VRAM.
  • Enhance Background:
    • If checked, runs Real-ESRGAN on the non-face areas.
    • Recommendation: Keep checked for full-photo restoration. Uncheck if you only care about the face or are running on limited hardware.
  • Upsample Face:
    • If checked, the restored face is also upsampled to match the background resolution.

4.2 Viewing Results

The result page features an interactive Before/After Slider. Drag the handle left and right to compare the pixels of the original versus the restored image directly.


5. Technical Details

5.1 Model Weights

The application automatically checks for and downloads the following weights to the weights/ directory on startup:

Model Path Description
CodeFormer weights/CodeFormer/codeformer.pth Main restoration model.
RetinaFace weights/facelib/detection_Resnet50_Final.pth Face detection.
ParseNet weights/facelib/parsing_parsenet.pth Face parsing (segmentation).
Real-ESRGAN weights/realesrgan/RealESRGAN_x2plus.pth Background upscaler (x2).

5.2 Performance Notes

  • Memory: The full pipeline (CodeFormer + Real-ESRGAN) requires significant RAM/VRAM. On CPU-only environments (like basic HF Spaces), processing a single image may take 30-60 seconds.
  • Git LFS: Image assets in this repository are tracked with Git LFS to keep the repo size manageable.

6. Credits & References