Text Generation
Transformers
GGUF
step3p5
custom_code
imatrix
conversational

Better Perplexity Alternative GGUFs

#13
by ubergarm - opened

I have done some perplexity benchmarks on various GGUFs and my impression is there are better mixes available than this official "Int4" Q4_K_S. Even more true if you want to use ik_llama.cpp with the newer SOTA quantization types.

https://huggingface.co/ubergarm/Step-3.5-Flash-GGUF#quant-collection

I've only released the best quants, with the unreleased used to guide the recipes:

ppl-Step-3.5

Also keep an eye on the good mixes by @AesSedai are available as well with more PPL/KLD data research likely available soon: https://huggingface.co/AesSedai/Step-3.5-Flash-GGUF

Thanks for this nice sized model and supporting the whole ik/llama.cpp ecosystem!!! Cheers!

@ubergarm Am I understanding correctly that IQ4_XS is the best quant for 128GB? Is it compatible with upstream llama.cpp?

I tried the IQ4_XS quant from ubergarm and it feels amazing!

Working well with ik_llama.cpp's -sm graph Graph/Tensor Parallel

sweep-bench-Step-3.5-Flash

Sign up or log in to comment