Will you Quantize Mistral Small 3.2 this way?

#3
by cmacboyd - opened

We really love this quantization and it works great with vLLM and tool calling for our use case. Will you be doing the same quantization for Mistral Small 3.2? πŸ™ŒπŸ

If you are not able to quantize Mistral Small 3.2 could you please advise how you ran this quantization? i.e. which calibration dataset did you use?

i dont work at the ISTA-DASLab , but they literally share the exact command and link to the compression library in this model card , you can do it at home !

Thank you for your response. From what I can see, they share the lm_eval command for evaluating the quantized LLM and original LLM on the OpenLLM benchmark for comparison and to reproduce their testing results, link the library for compressed tensors, and show how to use the quantized model in the model card. I don't see instructions or an exact command to produce the quantization or the calibration dataset for the quantization. Have I misread or misunderstood something?

the library for compressed tensors is the library you can use to make the GPTQ quants, i dont work there though , maybe i'm wrong just try it and see :-)

IST Austria Distributed Algorithms and Systems Lab org

Hi,

We can look into quantizing that specific model, but you might also want to give llm-compressor a try:
https://github.com/vllm-project/llm-compressor and in particular the examples: https://github.com/vllm-project/llm-compressor/tree/main/examples

This is basically a "production" version of the GPTQ algorithm together with some other methods.

Best regards,
Dan

Sign up or log in to comment