This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: HeliosPyramidAutoBlocks

Description: Auto Modular pipeline for pyramid progressive generation (T2V/I2V/V2V) using Helios.

This pipeline uses a 4-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

text_encoder (HeliosTextEncoderStep)
- Text Encoder step that generates text embeddings to guide the video generation
vae_encoder (HeliosPyramidAutoVaeEncoderStep)
- Encoder step that encodes video or image inputs. This is an auto pipeline block.
denoise (HeliosPyramidAutoCoreDenoiseStep)
- Pyramid core denoise step that selects the appropriate denoising block.
decode (HeliosDecodeStep)
- Decodes all chunk latents with the VAE, concatenates them, trims to the target frame count, and postprocesses into the final video output.

Model Components

text_encoder (UMT5EncoderModel)
tokenizer (AutoTokenizer)
guider (ClassifierFreeGuidance)
vae (AutoencoderKLWan)
video_processor (VideoProcessor)
transformer (HeliosTransformer3DModel)
scheduler (HeliosScheduler)

Workflow Input Specification

text2video

prompt (str): The prompt or prompts to guide image generation.

image2video

prompt (str): The prompt or prompts to guide image generation.
image (Image | list): Reference image(s) for denoising. Can be a single image or list of images.

video2video

prompt (str): The prompt or prompts to guide image generation.
video (None): Input video for video-to-video generation

Input/Output Specification

Inputs:

prompt (str): The prompt or prompts to guide image generation.
negative_prompt (str, optional): The prompt or prompts not to guide the image generation.
max_sequence_length (int, optional, defaults to 512): Maximum sequence length for prompt encoding.
video (None, optional): Input video for video-to-video generation
height (int, optional, defaults to 384): The height in pixels of the generated image.
width (int, optional, defaults to 640): The width in pixels of the generated image.
num_latent_frames_per_chunk (int, optional, defaults to 9): Number of latent frames per temporal chunk.
generator (Generator, optional): Torch generator for deterministic generation.
image (Image | list, optional): Reference image(s) for denoising. Can be a single image or list of images.
num_videos_per_prompt (int, optional, defaults to 1): Number of videos to generate per prompt.
image_latents (Tensor, optional): image latents used to guide the image generation. Can be generated from vae_encoder step.
video_latents (Tensor, optional): Encoded video latents for V2V generation.
image_noise_sigma_min (float, optional, defaults to 0.111): Minimum sigma for image latent noise.
image_noise_sigma_max (float, optional, defaults to 0.135): Maximum sigma for image latent noise.
video_noise_sigma_min (float, optional, defaults to 0.111): Minimum sigma for video latent noise.
video_noise_sigma_max (float, optional, defaults to 0.135): Maximum sigma for video latent noise.
num_frames (int, optional, defaults to 132): Total number of video frames to generate.
history_sizes (list): Sizes of long/mid/short history buffers for temporal context.
keep_first_frame (bool, optional, defaults to True): Whether to keep the first frame as a prefix in history.
pyramid_num_inference_steps_list (list, optional, defaults to [10, 10, 10]): Number of denoising steps per pyramid stage.
latents (Tensor, optional): Pre-generated noisy latents for image generation.
**denoiser_input_fields (None, optional): conditional model inputs for the denoiser: e.g. prompt_embeds, negative_prompt_embeds, etc.
attention_kwargs (dict, optional): Additional kwargs for attention processors.
fake_image_latents (Tensor, optional): Fake image latents used as history seed for I2V generation.
output_type (str, optional, defaults to np): Output format: 'pil', 'np', 'pt'.

Outputs:

videos (list): The generated videos.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support