GGUF variant/ollama support

by ruddbanga7 - opened Sep 11, 2025

Discussion

ruddbanga7

Sep 11, 2025

Are there any plans to provide GGUF variants for this model so that it can be used with ollama/llama.cpp?

gabegoodhart

IBM Granite org Sep 11, 2025

Hi @ruddbanga7 ! The implementation of ModernBert is a WIP in llama.cpp: https://github.com/ggml-org/llama.cpp/pull/15641. Once it's merged, we'll get GGUF versions posted to the Granite GGUF Models Collection

Awschult

6 days ago

Looks like the PR merged back in December. Has there been any progress on this?

Awschult

6 days ago

@gabegoodhart

Awschult

6 days ago

I would love to run this model with O-Lama. I know that O-Lama does support safe tensors, but with the modern B-E-R-T, I don't know if it will handle the safe tensor weights or if I should wait for G-Guff.

gabegoodhart

IBM Granite org 6 days ago

@Awschult thanks for checking in. We were so close today with the release of ollama 0.15.5! The PR in llama.cpp was indeed merged back in December, but we've been waiting on a vendor bump in Ollama to be able to run the model there. We got that merged in https://github.com/ollama/ollama/pull/13832, but then it was reverted in https://github.com/ollama/ollama/pull/14061 due to unexplained slowness with glm-4.7-flash. As of now, the version on main in ollama does not include this architecture, so we'll have to wait further for the vendor bump to be redone once the glm-4.7-flash slowness is debugged.

Awschult

6 days ago

Fantastic. Thank you for the update

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment