GGUF variant/ollama support

#2
by ruddbanga7 - opened

Are there any plans to provide GGUF variants for this model so that it can be used with ollama/llama.cpp?

IBM Granite org

Hi @ruddbanga7 ! The implementation of ModernBert is a WIP in llama.cpp: https://github.com/ggml-org/llama.cpp/pull/15641. Once it's merged, we'll get GGUF versions posted to the Granite GGUF Models Collection

Looks like the PR merged back in December. Has there been any progress on this?

I would love to run this model with O-Lama. I know that O-Lama does support safe tensors, but with the modern B-E-R-T, I don't know if it will handle the safe tensor weights or if I should wait for G-Guff.

IBM Granite org

@Awschult thanks for checking in. We were so close today with the release of ollama 0.15.5! The PR in llama.cpp was indeed merged back in December, but we've been waiting on a vendor bump in Ollama to be able to run the model there. We got that merged in https://github.com/ollama/ollama/pull/13832, but then it was reverted in https://github.com/ollama/ollama/pull/14061 due to unexplained slowness with glm-4.7-flash. As of now, the version on main in ollama does not include this architecture, so we'll have to wait further for the vendor bump to be redone once the glm-4.7-flash slowness is debugged.

Fantastic. Thank you for the update

Sign up or log in to comment