Very good quant

#1
by wonderfuldestruction - opened
llama.cpp b8121
Devstral-Small-2-24B-Instruct-2512-IQ4_XS-4.04bpw.gguf
RTX 5090

I do have to give a very good feedback. Have been using LM Studio's Q8_0 as daily driver. This quant performs identically, although less verbose in docs.

I've got forks from my own production environments where I test these quants, and yours completes practically the same as the Q8_0, albeit faster and larger context window for same settings.

Have yet to find major flaws. Until then, great work!

ByteShape org

Thank you for the thoughtful feedback and kind words. It means a lot.
If you run into any edge cases or weaknesses, we’d really appreciate hearing about them. That kind of feedback helps us refine our datasets, datatype selection, and benchmarks for future releases.

You can see here it did extremelly well against Unsloth's Devstral Small 2 Q6_K

https://www.reddit.com/r/LocalLLaMA/comments/1rg41ss/qwen35_27b_vs_devstral_small_2_nextjs_solidity/

ByteShape org

Thank you very much for taking the time to evaluate and compare these models and for sharing such a comprehensive report. It is very helpful.

I'm really hoping we see a ByteShape release of qwen3.5-27b.

I'm wanting to send all quanters to ByteShape University!

Sign up or log in to comment