π§ DiT (Diffusion Transformer) Fine-Tuning Experiments
Core Backbone for the Zulense Z1 Foundation Model
This repository hosts the Diffusion Transformer (DiT) checkpoints trained to generate educational video content. These models operate in the latent space of our Causal VAE and are responsible for the temporal consistency and logical flow of the generated math lectures.
π Model Ledger & Performance
We are releasing the training logs to demonstrate the optimization curve of the "Imagination Engine."
1. finetune_2_pytorch_model.bin (π Production Candidate)
- Role: The Z1 Foundation Backbone
- Status: β Converged / High Fidelity
- Performance:
- This checkpoint represents our stable run. It successfully learned to align temporal attention with the "teacher's movement" and "blackboard writing" logic.
- Metrics: Achieved target validation loss on the Class 5 & 8 Math dataset.
- Behavior: Shows strong temporal coherence (objects don't disappear randomly) and adheres to the physics of writing on a board.
- Recommendation: Use this file for all inference tasks related to Zulense Z1.
2. finetune_1_pytorch_model.bin (Experimental / Deprecated)
- Role: Initial Warmup Run
- Status: β οΈ Underfitted / High Noise
- Performance:
- This was an early checkpoint where the model struggled to decouple the background (classroom) from the foreground (teacher).
- Issues: Resulted in "flickering" artifacts and poor text alignment.
- Archived: Kept here for research comparison to show the impact of our improved data scheduling in
finetune_2.
ποΈ Architecture Context
The Zulense Video Pipeline follows a two-stage generation process:
- Stage 1 (VAE): Compresses video into latents (See:
causal_vae_checkpoint). - Stage 2 (DiT): This model (
finetune_2) acts as the denoising backbone, predicting the latent patches over time based on text prompts (e.g., "Draw a triangle with 3 angles").
π» Usage (Loading Weights)
import torch
# Path to the best performing checkpoint
model_path = "finetune_2_pytorch_model.bin"
# Load weights (assuming standard DiT structure)
state_dict = torch.load(model_path, map_location="cpu")
print(f"β
Loaded DiT Backbone: {model_path}")
print(f"Tensor keys found: {len(state_dict.keys())}")
- Downloads last month
- 23