--- license: apache-2.0 library_name: diffusers pipeline_tag: text-to-image datasets: - opendiffusionai/laion2b-squareish-1536px thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/examples/side_by_side_b.png base_model: jimmycarter/LibreFLUX --- # LibreFLUX-ControlNet ![Example: Control image vs result](examples/side_by_side_b.png) # Update - 4/10/2026 - Retrained this model on [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px) - I tripled the control layers, to get better guidance # Fun Facts - Trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/) - Uses SAM style images as input, outputs photorealistic images - Trained at 1024x1024 resolution, inference works best at 1.5k and up - Trained on 320K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px) - Base model is [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) ( de-distilled FLUX ) # Showcases
# Extra Details - I built this repo to train the model: [https://github.com/NeuralVFX/LibreFLUX-ControlNet](https://github.com/NeuralVFX/LibreFLUX-ControlNet) - Trained in same non-distilled fashion as [LibreFLUX](https://huggingface.co/jimmycarter/LibreFLUX) - Uses Attention Masking - Uses CFG during Inference ( allows negative prompting ) - Inference code roughly adapted from: [https://github.com/bghira/SimpleTuner](https://github.com/bghira/SimpleTuner) # ComfyUI - I've made some custom nodes for this: [https://github.com/NeuralVFX/LibreFLUX-ComfyUI](https://github.com/NeuralVFX/LibreFLUX-ComfyUI) # Compatibility ```py pip install -U diffusers==0.32.0 pip install -U "transformers @ git+https://github.com/huggingface/transformers@e15687fffe5c9d20598a19aeab721ae0a7580f8a" ``` Low VRAM: ```py pip install optimum-quanto ``` # Load Pipeline ```py import torch from diffusers import DiffusionPipeline model_id = "neuralvfx/LibreFlux-ControlNet" device = "cuda" if torch.cuda.is_available() else "cpu" dtype = torch.bfloat16 if device == "cuda" else torch.float32 pipe = DiffusionPipeline.from_pretrained( model_id, custom_pipeline=model_id, trust_remote_code=True, torch_dtype=dtype, safety_checker=None ).to(device) ``` # Inference ```py from PIL import Image from torchvision.transforms import ToTensor # Load Control Image cond = Image.open("examples/libre_flux_control_image.png") cond = cond.resize((1024, 1024)) # Convert PIL image to tensor and move to device with correct dtype cond_tensor = ToTensor()(cond)[:3,:,:].to(pipe.device, dtype=pipe.dtype).unsqueeze(0) out = pipe( prompt="many pieces of drift wood spelling libre flux sitting casting shadow on the lumpy sandy beach with foot prints all over it", negative_prompt="blurry", control_image=cond_tensor, # Use the tensor here num_inference_steps=75, guidance_scale=4.0, height =1024, width=1024, controlnet_conditioning_scale=1.0, num_images_per_prompt=1, control_mode=None, generator= torch.Generator().manual_seed(32), return_dict=True, ) out.images[0] ``` # Load Pipeline ( Low VRAM ) ```py import torch from diffusers import DiffusionPipeline from optimum.quanto import freeze, quantize, qint8 model_id = "neuralvfx/LibreFlux-ControlNet" device = "cuda" if torch.cuda.is_available() else "cpu" dtype = torch.bfloat16 if device == "cuda" else torch.float32 pipe = DiffusionPipeline.from_pretrained( model_id, custom_pipeline=model_id, trust_remote_code=True, torch_dtype=dtype, safety_checker=None ) quantize( pipe.transformer, weights=qint8, exclude=[ "*.norm", "*.norm1", "*.norm2", "*.norm2_context", "proj_out", "x_embedder", "norm_out", "context_embedder", ], ) quantize( pipe.controlnet, weights=qint8, exclude=[ "*.norm", "*.norm1", "*.norm2", "*.norm2_context", "proj_out", "x_embedder", "norm_out", "context_embedder", ], ) freeze(pipe.transformer) freeze(pipe.controlnet) pipe.enable_model_cpu_offload() ```