Size of GGUF Quants

#15
by thomaier - opened

Hello, just a quick question: Why are all quantizations of this model roughly 60GB in size? With other models and their quantizations, you usually see a significant impact of the quantization precision on the filesize of the GGUF.

See the note at https://unsloth.ai/docs/models/gpt-oss-how-to-run-and-fine-tune#running-gpt-oss! Specifically:

Any quant smaller than F16, including 2-bit has minimal accuracy loss, since only some parts (e.g., attention layers) are lower bit while most remain full-precision. That’s why sizes are close to the F16 model; for example, the 2-bit (11.5 GB) version performs nearly the same as the full 16-bit (14 GB) one. Once llama.cpp supports better quantization for these models, we'll upload them ASAP.

Thank you for clarifying this. I had read this in the docs but did not figure out that this impacts the size of the 120b variant this much

Sign up or log in to comment