Will you Quantize Mistral Small 3.2 this way?

by cmacboyd - opened Aug 22, 2025

Aug 22, 2025

•

edited Aug 22, 2025

We really love this quantization and it works great with vLLM and tool calling for our use case. Will you be doing the same quantization for Mistral Small 3.2? 🙌🐐

If you are not able to quantize Mistral Small 3.2 could you please advise how you ran this quantization? i.e. which calibration dataset did you use?

Tonic

Aug 22, 2025

i dont work at the ISTA-DASLab , but they literally share the exact command and link to the compression library in this model card , you can do it at home !

cmacboyd

Aug 22, 2025

•

edited Aug 22, 2025

Thank you for your response. From what I can see, they share the lm_eval command for evaluating the quantized LLM and original LLM on the OpenLLM benchmark for comparison and to reproduce their testing results, link the library for compressed tensors, and show how to use the quantized model in the model card. I don't see instructions or an exact command to produce the quantization or the calibration dataset for the quantization. Have I misread or misunderstood something?

Tonic

Aug 22, 2025

the library for compressed tensors is the library you can use to make the GPTQ quants, i dont work there though , maybe i'm wrong just try it and see :-)

d-alistarh

IST Austria Distributed Algorithms and Systems Lab org Aug 23, 2025

Hi,

We can look into quantizing that specific model, but you might also want to give llm-compressor a try:
https://github.com/vllm-project/llm-compressor and in particular the examples: https://github.com/vllm-project/llm-compressor/tree/main/examples

This is basically a "production" version of the GPTQ algorithm together with some other methods.

Best regards,
Dan

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment