Waifu Diffusion

A 130M-parameter diffusion model trained on 10,000 anime faces (90% monochrome) using rectified flow, patch diffusion, and CIELAB color space decoupling.

Model Details

  • Architecture: Diffusion Transformer (DiT-B) with Vision RoPE
  • Parameters: 130M
  • Training Data: 10k anime faces (80ร—80), 90% corrupted to grayscale
  • Training Steps: 1280 epochs ร— batch 256
  • Sampling: 50-step Euler integration

Versions

Model Details
waifu_diffusion_1280_bs256.safetensors Full training (1280 epochs, bs=256)
waifu_diffusion_128_bs32.safetensors Shallow trained version (128 epochs, bs=32)

Quick Start

import torch
from safetensors.torch import load_file
from skimage import color
import numpy as np

# Load model
model = JiT(
    input_size=80,
    patch_size=4,
    in_channels=3,
    hidden_size=768,
    depth=12,
    num_heads=12,
    num_classes=1
)
state_dict = load_file("waifu_diffusion_1280_bs256.safetensors")
model.load_state_dict(state_dict)
model.eval()

# Generate
device = "cuda"
model.to(device)

with torch.no_grad():
    xt = torch.randn((1, 3, 80, 80), device=device)
    y = torch.zeros(1, dtype=torch.long, device=device)
    
    for step in range(50):
        t = torch.tensor(step / 50, device=device)
        pred_x1 = model(xt, t, y, top_idx=0, left_idx=0)
        v = (pred_x1 - xt) / max(1.0 - step / 50, 1e-2)
        xt = xt + v / 50

# Convert CIELAB โ†’ RGB
lab = torch.clamp(pred_x1[0], -1, 1).cpu().numpy()
L = (lab[0] + 1) * 50
a = lab[1] * 128
b = lab[2] * 128
rgb = color.lab2rgb(np.stack([L, a, b], axis=-1))

Key Techniques

  • Rectified Flow: Straight-line paths from noise to data (50 steps vs. 1000s for DDPM)
  • CIELAB Decoupling: Separate luminance from color; mask gradients on monochrome โ†’ learn structure from all 10k, color from 1k
  • Patch Diffusion: Random 40ร—80 px crops act as data augmentation; effectively 10k โ†’ ~50k samples
  • Vision RoPE: 2D rotary embeddings for spatial consistency across patches

Links

Citation

@misc{waifu_diffusion_2026,
  author = {Abdurrahman Izzuddin Al Faruq},
  title = {Training a Waifu Diffusion Model with Patch Diffusion and Rectified Flow},
  year = {2026},
  url = {https://github.com/ruwwww/waifu_diffusion}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support