LoRAs for LTX 2.3

Here I will share some LoRAs that I trained for LTX 2.3.

These LoRAs may cover different use cases over time, so this repository is not limited to inpainting only.

Models

File Description
ltx23_inpaint_rank128_v1_02500steps.safetensors Sometimes this checkpoint follows the prompt better, probably because it experienced less overfitting.
ltx23_inpaint_rank128_v1_10000steps.safetensors Sometimes this checkpoint doesn't follow instructions quite right, because it focuses more on the size of the mask; but other than that, it uses the masked region better. This is probably because it experienced more overfitting after a longer training period on a more limited dataset.
ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors Inpainting LoRA with reference support. This model allows inpainting while also using a visual reference, which can help guide the desired replacement more precisely. Prompt quality is extremely important for good results, and mask size matters even more.
ltx23_edit_anything_global_rank128_v1_6000steps_prodigy or ltx23_edit_anything_global_rank128_v1_9000steps_adamw.safetensors Experimental Edit Anything LoRA trained on 8,000 video pairs for add / remove / replace / style transformations. Best used for experimentation, prompt testing, and building synthetic datasets, especially style-focused ones. Model file: https://huggingface.co/Alissonerdx/LTX-LoRAs/blob/main/ltx23_edit_anything_global_rank128_v1_9000steps_adamw.safetensors

Use whatever suits you best.

Important inference notes for the inpainting LoRAs

These inpainting LoRAs were trained with a specific guide and mask setup, so input preparation during inference is important.

How to use the mask

During inference, you should not pass the mask as a separate channel.

The mask must be embedded into the guide video, which means:

  • the mask video
  • and the guide video

must be treated as a single video.

After that, you need to use the LTXVAddGuideMulti node to pass the guide video into the model.

Required colors

For inference to match the training setup, the colors are important:

  • the mask must be magenta: (255, 0, 255)
  • the green area of the reference must be chroma key green: (0, 255, 0)

About the mask format used during training

My dataset included samples where the mask was more blockified. In other words, the default pattern used 8x8 blocks.

To better reproduce the training conditions during inference, you can use:

  • Blockify Mask from KJNodes

This may help make the mask distribution closer to what the model saw during training.

For the new reference-based inpainting LoRA, this is especially important:

  • sometimes you need to use blockify so the mask becomes more agnostic to the previous object's shape
  • sometimes you need to expand the mask to give the new object more room to work properly
  • a good default recommendation is Blockify Mask with size 8
  • you can expand up to 512, which effectively makes the mask become a full rectangle

Notes

  • Base model: Lightricks/LTX-2.3
  • Checkpoint behavior may vary significantly in terms of:
    • prompt adherence
    • use of the masked area
    • overfitting tendency
  • For the reference-based LoRA, prompting is extremely important for result quality
  • For the reference-based LoRA, mask size and mask preparation are critical

Practical recommendations

For the inpainting LoRAs in this repo:

  • If you want better prompt adherence, try the 2500 steps checkpoint first
  • If you want better use of the masked area, try the 10000 steps checkpoint first
  • If you want an inpainting LoRA that works very well both with a visual reference and in a text-only setup, try ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors

The best approach is to compare both in your workflow, since preference may vary depending on the scene, mask, and prompt.


Experimental Edit Anything LoRA

Training status: This training is still in progress, and this model may be updated unexpectedly over time as new experiments and checkpoints are tested.

This repo also includes an experimental Edit Anything LoRA for video editing.

Model file: ltx23_edit_anything_global_rank128_v1_9000steps_adamw.safetensors
Model location: https://huggingface.co/Alissonerdx/LTX-LoRAs/blob/main/ltx23_edit_anything_global_rank128_v1_9000steps_adamw.safetensors

Recommended workflow: ltx23_edit_anything_v1

This LoRA was trained on 8,000 video pairs and was designed more as a research / experimentation checkpoint than as a fully polished professional production model.

So expectations should be set accordingly:

  • this is still experimental
  • it was not trained with professional-grade output quality as the main target
  • it is especially useful for testing edit behavior, prompt structures, and dataset-building workflows
  • it is a good model for building synthetic datasets, especially style-oriented datasets

Prompt patterns used during training

The training captions were organized around four core task types:

Add
Add a/an [subject/object] with [clear visual attributes], [precise location in the scene].

Remove
Remove the [subject/object] [location or identifying description].

Replace
Replace the [original subject/object] [location] with a/an [new subject/object] with [clear visual attributes].

Convert / Style
Convert the video into a [style name] style.

The prompts that worked best were usually:

  • action-first
  • visually specific
  • spatially grounded
  • written for video scenes, not still images

Common subject and object types

Some of the most common object families in the training prompts were:

  • men
  • women
  • people
  • robots
  • dogs
  • cats
  • characters
  • laptops
  • buildings
  • plants / trees

In practice, the model saw a lot of:

  • people and body-centered edits
  • animals and stylized creatures
  • handheld objects and props
  • vehicles and scene replacements
  • buildings and background elements
  • style conversion prompts

Style coverage

Some of the more interesting styles present in the training prompts included:

  • Pencil Sketch
  • Flat Vector Illustration
  • Flat Vector Cartoon
  • Watercolor Painting
  • Digital Oil Painting
  • Van Gogh
  • Pop Art
  • 3D Chibi
  • Play-Doh
  • Claymation
  • Comic Book
  • American Cartoon
  • Cel-Shaded Anime
  • Ghibli-like animation looks
  • Traditional Chinese Painting

Inference notes for the Edit Anything LoRA

One of the biggest inference factors for this model is CFG.

A good starting point is to first test a distilled model with CFG = 1.

If the edit is too weak or the model is not following the prompt well enough, increasing CFG can be the key.

In some cases, increasing the distill LoRA strength to around 1.2 can also help.

Another useful setup is to use the base model together with the distill LoRA, because that gives more room to balance:

  • LoRA strength
  • CFG
  • number of steps

This is often a better way to reach a more stable tradeoff between:

  • prompt adherence
  • edit strength
  • visual stability
  • overall controllability

Practical takeaways

  • start with a distilled setup at CFG = 1
  • if the edit is too weak, increase CFG
  • if needed, also increase LoRA strength slightly, for example to 1.2
  • if you want more control, try the base model + distill LoRA instead of relying only on a fixed distilled setup
  • better prompts are usually more specific and more spatially grounded

Example prompts for the Edit Anything LoRA

Example 1 β€” Add

Prompt:
Add a black vintage typewriter with white keys and silver accents on the right side of the wooden counter.

Example 2 β€” Remove

Prompt:
Remove the person standing in the center of the frame.

Example 3 β€” Replace

Prompt:
Replace the blue car on the road with a large white and blue ship occupying the same position in the scene.

Example 4 β€” Convert / Style

Prompt:
Convert the video into a watercolor painting style.


Examples β€” 2500 Steps

Example 1

Model: ltx23_inpaint_rank128_v1_02500steps.safetensors

Video: videos/sample_1_inpaint_2500.mp4

Prompt:


Examples β€” 10000 Steps

Example 1

Model: ltx23_inpaint_rank128_v1_10000steps.safetensors

Video: videos/sample_1_inpaint_10000.mp4

Prompt:


Examples β€” Reference Inpainting (R2V) β€” 3000 Steps

This LoRA can be used in two ways:

  • Reference-guided inpainting, where the reference image actively guides the replacement
  • Text-only style inpainting, by sending a blank image as the reference input

A practical issue to keep in mind is identity leakage. In some scenes, if the prompt is not specific enough, the model may copy identity traits or visual details from another character already present in the source scene instead of following the intended reference closely. This is especially important for full-body references, so prompt specificity matters a lot.

Example 1

Model: ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors

Video: videos/sample_1_masked_r2v_3000.mp4

Prompt:
A man resembling Donald Trump playing an electric guitar on stage, making energetic performance movements, with a confident pose, expressive body language, and dynamic rockstar attitude. He is holding the guitar dramatically while performing, with stage lighting, motion, and a lively concert atmosphere.

Note: This example used a full-body reference. The prompt had to be very specific, otherwise the model would leak identity details from another character already present in the scene instead of following the intended reference. In practice, this means prompt specificity is important to reduce identity leakage.


Example 2

Model: ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors

Video: videos/sample_2_masked_r2v_3000.mp4

Prompt:

A nighttime mountain road drift scene with a Tesla Cybertruck performing a dramatic high-speed drift around a sharp curve, sliding sideways across the asphalt with aggressive motion and strong driving energy. The Cybertruck has its iconic angular triangular wedge-shaped body, metallic panels, sharp geometric silhouette, bright headlights, and futuristic design clearly visible. Tire smoke and drift streaks trail behind the vehicle, matching the speed and intensity of the original action scene. Streetlights illuminate the road, the forest background remains dark and dense, and the camera perspective stays low and cinematic, emphasizing motion, speed, and control. The Cybertruck is fully integrated into the scene with realistic scale, lighting, shadows, reflections, and ground contact.


Example 3

Model: ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors

Video: videos/sample_3_masked_r2v_3000.mp4

Prompt:

A nighttime mountain road drift scene with a classic Volkswagen Beetle performing a dramatic high-speed drift around a sharp curve, sliding sideways across the asphalt with strong motion and driving energy. The Beetle has its iconic rounded body, compact vintage shape, circular headlights, curved roofline, and unmistakable classic design clearly visible. Tire smoke and drift streaks trail behind the car, matching the speed and intensity of the original action scene. Streetlights illuminate the road, the forest background remains dark and dense, and the camera perspective stays low and cinematic, emphasizing motion, speed, and control. The Beetle is fully integrated into the scene with realistic scale, lighting, shadows, reflections, and ground contact.

Note: This example shows that the masked R2V model can also be used like a text-only inpainting model without a real reference. To do that, simply send a blank image in place of the reference image.


Example 4

Model: ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors

Video: videos/sample_4_masked_r2v_3000.mp4

Prompt:
A man riding a tiger on a rural road in daylight, with the tiger rearing upward in a dramatic wheelie-like pose, matching the same dynamic action and position as the original motorcycle. The tiger is large, powerful, and realistic, with orange fur, black stripes, strong muscles, and natural anatomy clearly visible. The man is balanced on top of the tiger as if controlling it during the stunt, with believable body posture and strong motion energy. Keep the same road, camera angle, lighting, shadows, background, and overall composition unchanged. The scene should feel like a real action moment captured outdoors.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Alissonerdx/LTX-LoRAs

Adapter
(26)
this model