diffusion_model_quanto_Fp16_&_int8.safetensors

by haihung - opened 2 days ago

Discussion

haihung

2 days ago

As the title suggests, if possible, could you please provide a version for older graphics card architectures?

Is a mixed precision FP16 and INT8 feasible?

ltx-2.3-22b-distilled-1.1_diffusion_model_quanto_ "Fp16_int8" .safetensors

DeepBeepMeep

Owner 2 days ago

It should work already with older cards as the bf16 is converted to fp16 when needed. Do you have any issue ?

haihung

2 days ago

•

edited 2 days ago

Regarding IN8, it supports older card lines like Volta, but it may not yet support the BF16 format. Therefore, if it's possible to convert BF16 to FP16, it could fully utilize the Tensor FP16 Mixed Precision 250 TFLOPS and FP16 (Half Precision) 125 TFLOPS performance across architectures and speeds. For example, the weights:
-transformer_blocks.0.audio_ff.net.0.proj.bias[8192] BF16

................................attn.to_k.weight._scale[2048, 1] BF16

In my opinion, BF16 won't be native for all older Nvidia architectures. It might take some time to convert.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment