Better Perplexity Alternative GGUFs

#13

by ubergarm - opened 5 days ago

5 days ago

I have done some perplexity benchmarks on various GGUFs and my impression is there are better mixes available than this official "Int4" Q4_K_S. Even more true if you want to use ik_llama.cpp with the newer SOTA quantization types.

https://huggingface.co/ubergarm/Step-3.5-Flash-GGUF#quant-collection

I've only released the best quants, with the unreleased used to guide the recipes:

Also keep an eye on the good mixes by @AesSedai are available as well with more PPL/KLD data research likely available soon: https://huggingface.co/AesSedai/Step-3.5-Flash-GGUF

Thanks for this nice sized model and supporting the whole ik/llama.cpp ecosystem!!! Cheers!

tarruda

5 days ago

@ubergarm Am I understanding correctly that IQ4_XS is the best quant for 128GB? Is it compatible with upstream llama.cpp?

tarruda

5 days ago

I tried the IQ4_XS quant from ubergarm and it feels amazing!

ubergarm

4 days ago

Working well with ik_llama.cpp's -sm graph Graph/Tensor Parallel

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment