Nunchaku version
hi hugless,have you try with your computer? actually this model can be run with 8GB gpu very well, even can be run with 6GB gpu.
I ran this on a 12GB GPU just fine... I don't think Nunchaku supports LORAs for Qwen...
That's interesting, I've always avoided models larger than 16GB because I thought they would run very slow
im trying next, after the bf16 optimized ggufs are up.
since i got an v5 nsfw bf16 safetensors i think its an easy task, getting the bf16 took over a day on my potato.
but cant test on amd mi25...
That's interesting, I've always avoided models larger than 16GB because I thought they would run very slow
Running at 1 CFG, fp8 and 4 steps really helps. Nunchaku and lower quants are even faster, but GGUFs have some quality concerns and nunchaku makes working with LORAs difficult (if not impossible).
i can confirm it runs very good on my 6gb vram gpu nvidia 3060 laptop
im trying next, after the bf16 optimized ggufs are up.
since i got an v5 nsfw bf16 safetensors i think its an easy task, getting the bf16 took over a day on my potato.
but cant test on amd mi25...
Haha. Only if you know specifically what NOT to quantize. And then, if nunchaku doesn't understand mixed quants, the model will fail. The AIO model does a really bad job if quantized to fx Q8.