File size: 7,391 Bytes
adf2fff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# CodeFormer Face Restoration - Project Documentation

## 1. Introduction

**CodeFormer** is a robust blind face restoration algorithm designed to restore old, degraded, or AI-generated face images. It utilizes a **Codebook Lookup Transformer** (VQGAN-based) to predict high-quality facial features even from severe degradation, ensuring that the restored faces look natural and faithful to the original identity.

This project wraps the core CodeFormer research code into a deployable, user-friendly **Flask Web Application**, containerized with **Docker** for easy deployment on platforms like Hugging Face Spaces.

### Key Features
*   **Blind Face Restoration:** Restores faces from low-quality inputs without knowing the specific degradation details.
*   **Background Enhancement:** Uses **Real-ESRGAN** to upscale and enhance the non-face background regions of the image.
*   **Face Alignment & Paste-back:** Automatically detects faces, aligns them for processing, and seamlessly blends them back into the original image.
*   **Adjustable Fidelity:** Users can balance between restoration quality (hallucinating details) and identity fidelity (keeping the original look).

---

## 2. System Architecture

The application is built on a Python/PyTorch backend served via Flask.

### 2.1 Technology Stack
*   **Framework:** Flask (Python Web Server)
*   **Deep Learning:** PyTorch, TorchVision
*   **Image Processing:** OpenCV, NumPy, Pillow
*   **Core Libraries:** `basicsr` (Basic Super-Restoration), `facelib` (Face detection/utils)
*   **Frontend:** HTML5, Bootstrap 5, Jinja2 Templates
*   **Containerization:** Docker (CUDA-enabled)

### 2.2 Directory Structure
```

CodeFormer/

β”œβ”€β”€ app.py                 # Main Flask application entry point

β”œβ”€β”€ Dockerfile             # Container configuration

β”œβ”€β”€ requirements.txt       # Python dependencies

β”œβ”€β”€ basicsr/               # Core AI framework (Super-Resolution tools)

β”œβ”€β”€ facelib/               # Face detection and alignment utilities

β”œβ”€β”€ templates/             # HTML Frontend

β”‚   β”œβ”€β”€ index.html         # Upload interface

β”‚   └── result.html        # Results display

β”œβ”€β”€ static/                # Static assets (css, js, uploads)

β”‚   β”œβ”€β”€ uploads/           # Temporary storage for input images

β”‚   └── results/           # Temporary storage for processed output

└── weights/               # Pre-trained model weights (downloaded on startup)

    β”œβ”€β”€ CodeFormer/        # CodeFormer model (.pth)

    β”œβ”€β”€ facelib/           # Detection (RetinaFace) and Parsing models

    └── realesrgan/        # Background upscaler (Real-ESRGAN)

```

### 2.3 Logic Flow
1.  **Input:** User uploads an image via the Web UI.
2.  **Pre-processing (`app.py`):**
    *   Image is saved to `static/uploads`.
    *   Parameters (fidelity, upscale factor) are parsed.
3.  **Inference Pipeline:**
    *   **Detection:** `facelib` detects faces in the image using RetinaFace.
    *   **Alignment:** Faces are cropped and aligned to a standard 512x512 resolution.
    *   **Restoration:** The **CodeFormer** model processes the aligned faces.
    *   **Upscaling (Optional):** The background is upscaled using **Real-ESRGAN**.
    *   **Paste-back:** Restored faces are warped back to their original positions and blended.
4.  **Output:** The final image is saved to `static/results` and displayed to the user.

---

## 3. Installation & Deployment

### 3.1 Docker Deployment (Recommended)
The project is optimized for Docker.

**Prerequisites:** Docker, NVIDIA GPU (optional, but recommended).

1.  **Build the Image:**
    ```bash

    docker build -t codeformer-app .

    ```


2.  **Run the Container:**
    ```bash

    # Run on port 7860 (Standard for HF Spaces)

    docker run -it -p 7860:7860 codeformer-app

    ```

    *Note: To use GPU, add the `--gpus all` flag to the run command.*


### 3.2 Hugging Face Spaces Deployment
This repository is configured for direct deployment to Hugging Face.

1.  Create a **Docker** Space on Hugging Face.
2.  Push this entire repository to the Space's Git remote.
    ```bash

    git remote add hf git@hf.co:spaces/USERNAME/SPACE_NAME

    git push hf main

    ```

3.  The Space will build (approx. 5-10 mins) and launch automatically.


### 3.3 Local Development
1.  **Install Environment:**
    ```bash

    conda create -n codeformer python=3.8

    conda activate codeformer

    pip install -r requirements.txt

    ```

2.  **Install Basicsr:**

    ```bash

    python basicsr/setup.py install

    ```

3.  **Run App:**

    ```bash

    python app.py

    ```


---

## 4. User Guide (Web Interface)

### 4.1 Interface Controls

*   **Input Image:** Supports standard formats (JPG, PNG, WEBP). Drag and drop supported.
*   **Fidelity Weight (w):**
    *   **Range:** 0.0 to 1.0.
    *   **0.0 (Better Quality):** The model "hallucinates" more details. Results look very sharp and high-quality but may slightly alter the person's identity (look less like the original).
    *   **1.0 (Better Identity):** The model sticks strictly to the original features. Results are faithful to the original photo but might be blurrier or contain more artifacts.
    *   **Recommended:** 0.5 is a balanced default.
*   **Upscale Factor:**
    *   Scales the final output resolution (1x, 2x, or 4x).
    *   *Note: Higher scaling requires more VRAM.*
*   **Enhance Background:**
    *   If checked, runs Real-ESRGAN on the non-face areas.
    *   *Recommendation:* Keep checked for full-photo restoration. Uncheck if you only care about the face or are running on limited hardware.
*   **Upsample Face:**
    *   If checked, the restored face is also upsampled to match the background resolution.

### 4.2 Viewing Results
The result page features an interactive **Before/After Slider**. Drag the handle left and right to compare the pixels of the original versus the restored image directly.

---

## 5. Technical Details

### 5.1 Model Weights
The application automatically checks for and downloads the following weights to the `weights/` directory on startup:

| Model | Path | Description |
| :--- | :--- | :--- |
| **CodeFormer** | `weights/CodeFormer/codeformer.pth` | Main restoration model. |
| **RetinaFace** | `weights/facelib/detection_Resnet50_Final.pth` | Face detection. |
| **ParseNet** | `weights/facelib/parsing_parsenet.pth` | Face parsing (segmentation). |
| **Real-ESRGAN** | `weights/realesrgan/RealESRGAN_x2plus.pth` | Background upscaler (x2). |

### 5.2 Performance Notes
*   **Memory:** The full pipeline (CodeFormer + Real-ESRGAN) requires significant RAM/VRAM. On CPU-only environments (like basic HF Spaces), processing a single image may take 30-60 seconds.
*   **Git LFS:** Image assets in this repository are tracked with Git LFS to keep the repo size manageable.

---

## 6. Credits & References

*   **Original Paper:** [Towards Robust Blind Face Restoration with Codebook Lookup Transformer (NeurIPS 2022)](https://arxiv.org/abs/2206.11253)
*   **Authors:** Shangchen Zhou, Kelvin C.K. Chan, Chongyi Li, Chen Change Loy (S-Lab, Nanyang Technological University).
*   **Original Repository:** [sczhou/CodeFormer](https://github.com/sczhou/CodeFormer)