Size of GGUF Quants

#15

by thomaier - opened Feb 4

Feb 4

Hello, just a quick question: Why are all quantizations of this model roughly 60GB in size? With other models and their quantizations, you usually see a significant impact of the quantization precision on the filesize of the GGUF.

binars128

Feb 4

See the note at https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune#running-gpt-oss! Specifically:

Any quant smaller than F16, including 2-bit has minimal accuracy loss, since only some parts (e.g., attention layers) are lower bit while most remain full-precision. That’s why sizes are close to the F16 model; for example, the 2-bit (11.5 GB) version performs nearly the same as the full 16-bit (14 GB) one. Once llama.cpp supports better quantization for these models, we'll upload them ASAP.

thomaier

Feb 4

Thank you for clarifying this. I had read this in the docs but did not figure out that this impacts the size of the 120b variant this much

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment