# CodeFormer Face Restoration - Project Documentation ## 1. Introduction **CodeFormer** is a robust blind face restoration algorithm designed to restore old, degraded, or AI-generated face images. It utilizes a **Codebook Lookup Transformer** (VQGAN-based) to predict high-quality facial features even from severe degradation, ensuring that the restored faces look natural and faithful to the original identity. This project wraps the core CodeFormer research code into a deployable, user-friendly **Flask Web Application**, containerized with **Docker** for easy deployment on platforms like Hugging Face Spaces. ### Key Features * **Blind Face Restoration:** Restores faces from low-quality inputs without knowing the specific degradation details. * **Background Enhancement:** Uses **Real-ESRGAN** to upscale and enhance the non-face background regions of the image. * **Face Alignment & Paste-back:** Automatically detects faces, aligns them for processing, and seamlessly blends them back into the original image. * **Adjustable Fidelity:** Users can balance between restoration quality (hallucinating details) and identity fidelity (keeping the original look). --- ## 2. System Architecture The application is built on a Python/PyTorch backend served via Flask. ### 2.1 Technology Stack * **Framework:** Flask (Python Web Server) * **Deep Learning:** PyTorch, TorchVision * **Image Processing:** OpenCV, NumPy, Pillow * **Core Libraries:** `basicsr` (Basic Super-Restoration), `facelib` (Face detection/utils) * **Frontend:** HTML5, Bootstrap 5, Jinja2 Templates * **Containerization:** Docker (CUDA-enabled) ### 2.2 Directory Structure ``` CodeFormer/ ├── app.py # Main Flask application entry point ├── Dockerfile # Container configuration ├── requirements.txt # Python dependencies ├── basicsr/ # Core AI framework (Super-Resolution tools) ├── facelib/ # Face detection and alignment utilities ├── templates/ # HTML Frontend │ ├── index.html # Upload interface │ └── result.html # Results display ├── static/ # Static assets (css, js, uploads) │ ├── uploads/ # Temporary storage for input images │ └── results/ # Temporary storage for processed output └── weights/ # Pre-trained model weights (downloaded on startup) ├── CodeFormer/ # CodeFormer model (.pth) ├── facelib/ # Detection (RetinaFace) and Parsing models └── realesrgan/ # Background upscaler (Real-ESRGAN) ``` ### 2.3 Logic Flow 1. **Input:** User uploads an image via the Web UI. 2. **Pre-processing (`app.py`):** * Image is saved to `static/uploads`. * Parameters (fidelity, upscale factor) are parsed. 3. **Inference Pipeline:** * **Detection:** `facelib` detects faces in the image using RetinaFace. * **Alignment:** Faces are cropped and aligned to a standard 512x512 resolution. * **Restoration:** The **CodeFormer** model processes the aligned faces. * **Upscaling (Optional):** The background is upscaled using **Real-ESRGAN**. * **Paste-back:** Restored faces are warped back to their original positions and blended. 4. **Output:** The final image is saved to `static/results` and displayed to the user. --- ## 3. Installation & Deployment ### 3.1 Docker Deployment (Recommended) The project is optimized for Docker. **Prerequisites:** Docker, NVIDIA GPU (optional, but recommended). 1. **Build the Image:** ```bash docker build -t codeformer-app . ``` 2. **Run the Container:** ```bash # Run on port 7860 (Standard for HF Spaces) docker run -it -p 7860:7860 codeformer-app ``` *Note: To use GPU, add the `--gpus all` flag to the run command.* ### 3.2 Hugging Face Spaces Deployment This repository is configured for direct deployment to Hugging Face. 1. Create a **Docker** Space on Hugging Face. 2. Push this entire repository to the Space's Git remote. ```bash git remote add hf git@hf.co:spaces/USERNAME/SPACE_NAME git push hf main ``` 3. The Space will build (approx. 5-10 mins) and launch automatically. ### 3.3 Local Development 1. **Install Environment:** ```bash conda create -n codeformer python=3.8 conda activate codeformer pip install -r requirements.txt ``` 2. **Install Basicsr:** ```bash python basicsr/setup.py install ``` 3. **Run App:** ```bash python app.py ``` --- ## 4. User Guide (Web Interface) ### 4.1 Interface Controls * **Input Image:** Supports standard formats (JPG, PNG, WEBP). Drag and drop supported. * **Fidelity Weight (w):** * **Range:** 0.0 to 1.0. * **0.0 (Better Quality):** The model "hallucinates" more details. Results look very sharp and high-quality but may slightly alter the person's identity (look less like the original). * **1.0 (Better Identity):** The model sticks strictly to the original features. Results are faithful to the original photo but might be blurrier or contain more artifacts. * **Recommended:** 0.5 is a balanced default. * **Upscale Factor:** * Scales the final output resolution (1x, 2x, or 4x). * *Note: Higher scaling requires more VRAM.* * **Enhance Background:** * If checked, runs Real-ESRGAN on the non-face areas. * *Recommendation:* Keep checked for full-photo restoration. Uncheck if you only care about the face or are running on limited hardware. * **Upsample Face:** * If checked, the restored face is also upsampled to match the background resolution. ### 4.2 Viewing Results The result page features an interactive **Before/After Slider**. Drag the handle left and right to compare the pixels of the original versus the restored image directly. --- ## 5. Technical Details ### 5.1 Model Weights The application automatically checks for and downloads the following weights to the `weights/` directory on startup: | Model | Path | Description | | :--- | :--- | :--- | | **CodeFormer** | `weights/CodeFormer/codeformer.pth` | Main restoration model. | | **RetinaFace** | `weights/facelib/detection_Resnet50_Final.pth` | Face detection. | | **ParseNet** | `weights/facelib/parsing_parsenet.pth` | Face parsing (segmentation). | | **Real-ESRGAN** | `weights/realesrgan/RealESRGAN_x2plus.pth` | Background upscaler (x2). | ### 5.2 Performance Notes * **Memory:** The full pipeline (CodeFormer + Real-ESRGAN) requires significant RAM/VRAM. On CPU-only environments (like basic HF Spaces), processing a single image may take 30-60 seconds. * **Git LFS:** Image assets in this repository are tracked with Git LFS to keep the repo size manageable. --- ## 6. Credits & References * **Original Paper:** [Towards Robust Blind Face Restoration with Codebook Lookup Transformer (NeurIPS 2022)](https://arxiv.org/abs/2206.11253) * **Authors:** Shangchen Zhou, Kelvin C.K. Chan, Chongyi Li, Chen Change Loy (S-Lab, Nanyang Technological University). * **Original Repository:** [sczhou/CodeFormer](https://github.com/sczhou/CodeFormer)