Nunchaku version

#51

by hugless - opened Oct 19, 2025

Oct 19, 2025

@Phr00t , are you able to make a nunchaku version? it's a rather large model that should be used compressed down to be used with 16GB gpus with nunchaku

richuyouyaojie

Oct 19, 2025

hi hugless，have you try with your computer? actually this model can be run with 8GB gpu very well, even can be run with 6GB gpu.

Phr00t

Owner Oct 20, 2025

I ran this on a 12GB GPU just fine... I don't think Nunchaku supports LORAs for Qwen...

hugless

Oct 20, 2025

That's interesting, I've always avoided models larger than 16GB because I thought they would run very slow

Phil2Sat

Oct 20, 2025

im trying next, after the bf16 optimized ggufs are up.

since i got an v5 nsfw bf16 safetensors i think its an easy task, getting the bf16 took over a day on my potato.
but cant test on amd mi25...

Phr00t

Owner Oct 21, 2025

That's interesting, I've always avoided models larger than 16GB because I thought they would run very slow

Running at 1 CFG, fp8 and 4 steps really helps. Nunchaku and lower quants are even faster, but GGUFs have some quality concerns and nunchaku makes working with LORAs difficult (if not impossible).

VivekLeon

Oct 22, 2025

i can confirm it runs very good on my 6gb vram gpu nvidia 3060 laptop

cooperdk

8 days ago

im trying next, after the bf16 optimized ggufs are up.

since i got an v5 nsfw bf16 safetensors i think its an easy task, getting the bf16 took over a day on my potato.
but cant test on amd mi25...

Haha. Only if you know specifically what NOT to quantize. And then, if nunchaku doesn't understand mixed quants, the model will fail. The AIO model does a really bad job if quantized to fx Q8.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment