GGUF version for Local inference
#2
by
rishieee - opened
Hi, thank you for publishing TSLAM!!
I am trying to run this model locally using Ollama and/or llama.cpp on a Mac. Currently, the model files in this repository are already quantised with bitsandbytes (4-bit), which llama.cpp cannot convert to GGUF, and Transformers have trouble running directly on MPS (Mac).
Could you please upload:
A GGUF version of the model (for Ollama)? or
Or the unquantized (FP16 or BF16) weights so I can convert it myself?