Almost Impossible to run

#13
by LLaMA-lover - opened

I spent around 6 hours trying to get the VLLM fork to compile then the next hours trying to get the model to quantize so it can run with custom inference patches it still ran very slow and failed to produce any output this is not specifically my side issue there is no clear way to quantize nor BINARIES so it runs, I had to manually quantize it because I couldn't get the custom transformer fork to work. I also tried to NOT quantize the model and just use it on CPU but that also failed. Only special part about this model is that its not special it's only quality part relies on "Markovian RSA" which is compatible with ANY model. I'd rather do 10e(10^5) pushup's than to get this model to actually work.

Sign up or log in to comment