| --- |
| license: mit |
| --- |
| # Flow Matching & Diffusion Prediction Types |
| ## A Practical Guide to Sol, Lune, and Epsilon Prediction |
|
|
| --- |
|
|
| ## Overview |
|
|
| This document covers three distinct prediction paradigms used in diffusion and flow-matching models. Each was designed for different purposes and requires specific sampling procedures. |
|
|
| | Model | Prediction Type | What It Learned | Output Character | |
| |-------|----------------|-----------------|------------------| |
| | **Standard SD1.5** | Ξ΅ (epsilon/noise) | Remove noise | General purpose | |
| | **Sol** | v (velocity) via DDPM | Geometric structure | Flat silhouettes, mass placement | |
| | **Lune** | v (velocity) via flow | Texture and detail | Rich, detailed images | |
|
|
| --- |
|
|
| SD15-Flow-Sol (velocity prediction epsilon converted): |
|
|
| https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_sol.py |
| |
|  |
| |
| |
| SD15-Flow-Lune (rectified flow shift=2): |
| |
| https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_lune.py |
|
|
|  |
|
|
|
|
| TinyFlux-Lailah |
|
|
| tinyflux is currently in training and planning and is not yet ready to be used for production capacity. |
|
|
| https://huggingface.co/AbstractPhil/tiny-flux-deep |
|
|
|  |
|
|
|
|
| ## 1. Epsilon (Ξ΅) Prediction β Standard Diffusion |
|
|
| ### Core Concept |
| > **"Predict the noise that was added"** |
|
|
| The model learns to identify and remove noise from corrupted images. |
|
|
| ### The Formula (Simplified) |
|
|
| ``` |
| TRAINING: |
| x_noisy = β(Ξ±) * x_clean + β(1-Ξ±) * noise |
| β |
| Model predicts: Ξ΅Μ = "what noise was added?" |
| β |
| Loss = ||Ξ΅Μ - noise||Β² |
| |
| SAMPLING: |
| Start with pure noise |
| Repeatedly ask: "what noise is in this?" |
| Subtract a fraction of predicted noise |
| Repeat until clean |
| ``` |
|
|
| ### Reading the Math |
|
|
| - **Ξ± (alpha)**: "How much original image remains" (1 = all original, 0 = all noise) |
| - **β(1-Ξ±)**: "How much noise was mixed in" |
| - **Ξ΅**: The actual noise that was added |
| - **Ξ΅Μ**: Model's guess of what noise was added |
|
|
| ### Training Process |
|
|
| ```python |
| # Forward diffusion (corruption) |
| noise = torch.randn_like(x_clean) |
| Ξ± = scheduler.alphas_cumprod[t] |
| x_noisy = βΞ± * x_clean + β(1-Ξ±) * noise |
| |
| # Model predicts noise |
| Ξ΅_pred = model(x_noisy, t) |
| |
| # Loss: "Did you correctly identify the noise?" |
| loss = MSE(Ξ΅_pred, noise) |
| ``` |
|
|
| ### Sampling Process |
|
|
| ```python |
| # DDPM/DDIM sampling |
| for t in reversed(timesteps): # 999 β 0 |
| Ξ΅_pred = model(x, t) |
| x = scheduler.step(Ξ΅_pred, t, x) # Removes predicted noise |
| ``` |
|
|
| ### Utility & Behavior |
|
|
| - **Strength**: General-purpose image generation |
| - **Weakness**: No explicit understanding of image structure |
| - **Use case**: Standard text-to-image generation |
|
|
| --- |
|
|
| ## 2. Velocity (v) Prediction β Sol (DDPM Framework) |
|
|
| ### Core Concept |
| > **"Predict the direction from noise to data"** |
|
|
| Sol predicts velocity but operates within the DDPM scheduler framework, requiring conversion from velocity to epsilon for sampling. |
|
|
| ### The Formula (Simplified) |
|
|
| ``` |
| TRAINING: |
| x_t = Ξ± * x_clean + Ο * noise (same as DDPM) |
| v = Ξ± * noise - Ο * x_clean (velocity target) |
| β |
| Model predicts: vΜ = "which way is the image?" |
| β |
| Loss = ||vΜ - v||Β² |
| |
| SAMPLING: |
| Convert velocity β epsilon |
| Use standard DDPM scheduler stepping |
| ``` |
|
|
| ### Reading the Math |
|
|
| - **v (velocity)**: Direction vector in latent space |
| - **Ξ± (alpha)**: β(Ξ±_cumprod) β signal strength |
| - **Ο (sigma)**: β(1 - Ξ±_cumprod) β noise strength |
| - **The velocity formula**: `v = Ξ± * Ξ΅ - Ο * xβ` |
| - "Velocity is the signal-weighted noise minus noise-weighted data" |
|
|
| ### Why Velocity in DDPM? |
|
|
| Sol was trained with David (the geometric assessor) providing loss weighting. This setup used: |
| - DDPM noise schedule for interpolation |
| - Velocity prediction for training target |
| - Knowledge distillation from a teacher |
|
|
| The result: Sol learned **geometric structure** rather than textures. |
|
|
| ### Training Process (David-Weighted) |
|
|
| ```python |
| # DDPM-style corruption |
| noise = torch.randn_like(latents) |
| t = torch.randint(0, 1000, (batch,)) |
| Ξ± = sqrt(scheduler.alphas_cumprod[t]) |
| Ο = sqrt(1 - scheduler.alphas_cumprod[t]) |
| |
| x_t = Ξ± * latents + Ο * noise |
| |
| # Velocity target (NOT epsilon!) |
| v_target = Ξ± * noise - Ο * latents |
| |
| # Model predicts velocity |
| v_pred = model(x_t, t) |
| |
| # David assesses geometric quality β adjusts loss weights |
| loss_weights = david_assessor(features, t) |
| loss = weighted_MSE(v_pred, v_target, loss_weights) |
| ``` |
|
|
| ### Sampling Process (CRITICAL: v β Ξ΅ conversion) |
|
|
| ```python |
| # Must convert velocity to epsilon for DDPM scheduler |
| scheduler = DDPMScheduler(num_train_timesteps=1000) |
| |
| for t in scheduler.timesteps: # 999, 966, 933, ... β 0 |
| v_pred = model(x, t) |
| |
| # Convert velocity β epsilon |
| Ξ± = sqrt(scheduler.alphas_cumprod[t]) |
| Ο = sqrt(1 - scheduler.alphas_cumprod[t]) |
| |
| # Solve: v = Ξ±*Ξ΅ - Ο*xβ and x_t = Ξ±*xβ + Ο*Ξ΅ |
| # Result: xβ = (Ξ±*x_t - Ο*v) / (Ξ±Β² + ΟΒ²) |
| # Ξ΅ = (x_t - Ξ±*xβ) / Ο |
| |
| x0_hat = (Ξ± * x - Ο * v_pred) / (Ξ±Β² + ΟΒ²) |
| Ξ΅_hat = (x - Ξ± * x0_hat) / Ο |
| |
| x = scheduler.step(Ξ΅_hat, t, x) # Standard DDPM step with epsilon |
| ``` |
|
|
| ### Utility & Behavior |
|
|
| - **What Sol learned**: Platonic forms, silhouettes, mass distribution |
| - **Visual output**: Flat geometric shapes, correct spatial layout, no texture |
| - **Why this happened**: David rewarded geometric coherence, Sol optimized for clean David classification |
| - **Use case**: Structural guidance, composition anchoring, "what goes where" |
|
|
| ### Sol's Unique Property |
|
|
| Sol never "collapsed" β it learned the **skeleton** of images: |
| - Castle prompt β Castle silhouette, horizon line, sky gradient |
| - Portrait prompt β Head oval, shoulder mass, figure-ground separation |
| - City prompt β Building masses, street perspective, light positions |
|
|
| This is the "WHAT before HOW" that most diffusion models skip. |
|
|
| --- |
|
|
| ## 3. Velocity (v) Prediction β Lune (Rectified Flow) |
|
|
| ### Core Concept |
| > **"Predict the straight-line direction from noise to data"** |
|
|
| Lune uses true rectified flow matching where data travels in straight lines through latent space. |
|
|
| ### The Formula (Simplified) |
|
|
| ``` |
| TRAINING: |
| x_t = Ο * noise + (1-Ο) * data (linear interpolation) |
| v = noise - data (constant velocity) |
| β |
| Model predicts: vΜ = "straight line to noise" |
| β |
| Loss = ||vΜ - v||Β² |
| |
| SAMPLING: |
| Start at Ο=1 (noise) |
| Walk OPPOSITE to velocity (toward data) |
| End at Ο=0 (clean image) |
| ``` |
|
|
| ### Reading the Math |
|
|
| - **Ο (sigma)**: Interpolation parameter (1 = noise, 0 = data) |
| - **x_t = ΟΒ·noise + (1-Ο)Β·data**: Linear blend between noise and data |
| - **v = noise - data**: The velocity is CONSTANT along the path |
| - **Shift function**: `Ο' = shiftΒ·Ο / (1 + (shift-1)Β·Ο)` |
| - Biases sampling toward cleaner images (spends more steps refining) |
| |
| ### Key Difference from Sol |
| |
| | Aspect | Sol | Lune | |
| |--------|-----|------| |
| | Interpolation | DDPM (Ξ±, Ο from scheduler) | Linear (Ο, 1-Ο) | |
| | Velocity meaning | Complex (Ξ±Β·Ξ΅ - ΟΒ·xβ) | Simple (noise - data) | |
| | Sampling | Convert vβΞ΅, use scheduler | Direct Euler integration | |
| | Output | Geometric skeletons | Detailed images | |
| |
| ### Training Process |
| |
| ```python |
| # Linear interpolation (NOT DDPM schedule!) |
| noise = torch.randn_like(latents) |
| Ο = torch.rand(batch) # Random sigma in [0, 1] |
| |
| # Apply shift during training |
| Ο_shifted = (shift * Ο) / (1 + (shift - 1) * Ο) |
| Ο = Ο_shifted.view(-1, 1, 1, 1) |
| |
| x_t = Ο * noise + (1 - Ο) * latents |
| |
| # Velocity target: direction FROM data TO noise |
| v_target = noise - latents |
| |
| # Model predicts velocity |
| v_pred = model(x_t, Ο * 1000) # Timestep = Ο * 1000 |
| |
| loss = MSE(v_pred, v_target) |
| ``` |
| |
| ### Sampling Process (Direct Euler) |
| |
| ```python |
| # Start from pure noise (Ο = 1) |
| x = torch.randn(1, 4, 64, 64) |
| |
| # Sigma schedule: 1 β 0 with shift |
| sigmas = torch.linspace(1, 0, steps + 1) |
| sigmas = shift_sigma(sigmas, shift=3.0) |
| |
| for i in range(steps): |
| Ο = sigmas[i] |
| Ο_next = sigmas[i + 1] |
| dt = Ο - Ο_next # Positive (going from 1 toward 0) |
| |
| timestep = Ο * 1000 |
| v_pred = model(x, timestep) |
| |
| # SUBTRACT velocity (v points toward noise, we go toward data) |
| x = x - v_pred * dt |
| |
| # x is now clean image latent |
| ``` |
| |
| ### Why SUBTRACT the Velocity? |
| |
| ``` |
| v = noise - data (points FROM data TO noise) |
| |
| We want to go FROM noise TO data (opposite direction!) |
| |
| So: x_new = x_current - v * dt |
| = x_current - (noise - data) * dt |
| = x_current + (data - noise) * dt β Moving toward data β |
| ``` |
| |
| ### Utility & Behavior |
| |
| - **What Lune learned**: Rich textures, fine details, realistic rendering |
| - **Visual output**: Full detailed images with lighting, materials, depth |
| - **Training focus**: Portrait/pose data with caption augmentation |
| - **Use case**: High-quality image generation, detail refinement |
| |
| --- |
| |
| ## Comparison Summary |
| |
| ### Training Targets |
| |
| ``` |
| EPSILON (Ξ΅): target = noise |
| "What random noise was added?" |
| |
| VELOCITY (Sol): target = Ξ±Β·noise - ΟΒ·data |
| "What's the DDPM-weighted direction?" |
| |
| VELOCITY (Lune): target = noise - data |
| "What's the straight-line direction?" |
| ``` |
| |
| ### Sampling Directions |
| |
| ``` |
| EPSILON: x_new = scheduler.step(Ξ΅_pred, t, x) |
| Scheduler handles noise removal internally |
| |
| VELOCITY (Sol): Convert v β Ξ΅, then scheduler.step(Ξ΅, t, x) |
| Must translate to epsilon for DDPM math |
| |
| VELOCITY (Lune): x_new = x - v_pred * dt |
| Direct Euler integration, subtract velocity |
| ``` |
| |
| ### Visual Intuition |
| |
| ``` |
| EPSILON: |
| "There's noise hiding the image" |
| "I'll predict and remove the noise layer by layer" |
| β General-purpose denoising |
| |
| VELOCITY (Sol): |
| "I know which direction the image is" |
| "But I speak through DDPM's noise schedule" |
| β Learned structure, outputs skeletons |
| |
| VELOCITY (Lune): |
| "Straight line from noise to image" |
| "I'll walk that line step by step" |
| β Learned detail, outputs rich images |
| ``` |
| |
| --- |
| |
| ## Practical Implementation Checklist |
| |
| ### For Epsilon Models (Standard SD1.5) |
| - [ ] Use DDPM/DDIM/Euler scheduler |
| - [ ] Pass timestep as integer [0, 999] |
| - [ ] Scheduler handles everything |
| |
| ### For Sol (Velocity + DDPM) |
| - [ ] Use DDPMScheduler |
| - [ ] Model outputs velocity, NOT epsilon |
| - [ ] Convert: `x0 = (Ξ±Β·x - ΟΒ·v) / (Ξ±Β² + ΟΒ²)`, then `Ξ΅ = (x - Ξ±Β·x0) / Ο` |
| - [ ] Call `scheduler.step(Ξ΅, t, x)` |
| - [ ] Expect geometric/structural output |
| |
| ### For Lune (Velocity + Flow) |
| - [ ] NO scheduler needed β direct Euler |
| - [ ] Sigma goes 1 β 0 (not 0 β 1!) |
| - [ ] Apply shift: `Ο' = shiftΒ·Ο / (1 + (shift-1)Β·Ο)` |
| - [ ] Timestep to model: `Ο * 1000` |
| - [ ] SUBTRACT velocity: `x = x - v * dt` |
| - [ ] Expect detailed textured output |
| |
| --- |
| |
| ## Why This Matters for TinyFlux |
| |
| TinyFlux can leverage both experts: |
| |
| 1. **Sol (early timesteps)**: Provides geometric anchoring |
| - "Where should the castle be?" |
| - "What's the horizon line?" |
| - "How is mass distributed?" |
| |
| 2. **Lune (mid/late timesteps)**: Provides detail refinement |
| - "What texture is the stone?" |
| - "How does light fall?" |
| - "What color is the sky?" |
| |
| By combining geometric structure (Sol) with textural detail (Lune), TinyFlux can achieve better composition AND quality than either alone. |
| |
| --- |
| |
| ## Quick Reference Card |
| |
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β PREDICTION TYPES β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β EPSILON (Ξ΅) β |
| β Train: target = noise β |
| β Sample: scheduler.step(Ξ΅_pred, t, x) β |
| β Output: General images β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β VELOCITY - SOL (DDPM framework) β |
| β Train: target = Ξ±Β·Ξ΅ - ΟΒ·xβ β |
| β Sample: vβΞ΅ conversion, then scheduler.step(Ξ΅, t, x) β |
| β Output: Geometric skeletons β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β VELOCITY - LUNE (Rectified Flow) β |
| β Train: target = noise - data β |
| β Sample: x = x - vΒ·dt (Euler, Ο: 1β0) β |
| β Output: Detailed textured images β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
| |
| --- |
| |
| *Document Version: 1.0* |
| *Last Updated: January 2026* |
| *Authors: AbstractPhil & Claude OPUS 4.5* |
| |
| License: MIT |