🧠 DiT (Diffusion Transformer) Fine-Tuning Experiments

Core Backbone for the Zulense Z1 Foundation Model

This repository hosts the Diffusion Transformer (DiT) checkpoints trained to generate educational video content. These models operate in the latent space of our Causal VAE and are responsible for the temporal consistency and logical flow of the generated math lectures.

πŸ“‚ Model Ledger & Performance

We are releasing the training logs to demonstrate the optimization curve of the "Imagination Engine."

1. finetune_2_pytorch_model.bin (🌟 Production Candidate)

  • Role: The Z1 Foundation Backbone
  • Status: βœ… Converged / High Fidelity
  • Performance:
    • This checkpoint represents our stable run. It successfully learned to align temporal attention with the "teacher's movement" and "blackboard writing" logic.
    • Metrics: Achieved target validation loss on the Class 5 & 8 Math dataset.
    • Behavior: Shows strong temporal coherence (objects don't disappear randomly) and adheres to the physics of writing on a board.
    • Recommendation: Use this file for all inference tasks related to Zulense Z1.

2. finetune_1_pytorch_model.bin (Experimental / Deprecated)

  • Role: Initial Warmup Run
  • Status: ⚠️ Underfitted / High Noise
  • Performance:
    • This was an early checkpoint where the model struggled to decouple the background (classroom) from the foreground (teacher).
    • Issues: Resulted in "flickering" artifacts and poor text alignment.
    • Archived: Kept here for research comparison to show the impact of our improved data scheduling in finetune_2.

πŸ—οΈ Architecture Context

The Zulense Video Pipeline follows a two-stage generation process:

  1. Stage 1 (VAE): Compresses video into latents (See: causal_vae_checkpoint).
  2. Stage 2 (DiT): This model (finetune_2) acts as the denoising backbone, predicting the latent patches over time based on text prompts (e.g., "Draw a triangle with 3 angles").

πŸ’» Usage (Loading Weights)

import torch

# Path to the best performing checkpoint
model_path = "finetune_2_pytorch_model.bin"

# Load weights (assuming standard DiT structure)
state_dict = torch.load(model_path, map_location="cpu")

print(f"βœ… Loaded DiT Backbone: {model_path}")
print(f"Tensor keys found: {len(state_dict.keys())}")
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support