GGUF version for Local inference

by rishieee - opened 25 days ago

Hi, thank you for publishing TSLAM!!
I am trying to run this model locally using Ollama and/or llama.cpp on a Mac. Currently, the model files in this repository are already quantised with bitsandbytes (4-bit), which llama.cpp cannot convert to GGUF, and Transformers have trouble running directly on MPS (Mac).

Could you please upload:
A GGUF version of the model (for Ollama)? or
Or the unquantized (FP16 or BF16) weights so I can convert it myself?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment