In a Training Loop 🔄

David Belton PRO

DavidAU

AI & ML interests

Application(s) of single/multiple LLMs in specialized use cases & automation tasks. LLM, Prompt , System Role and Parameter engineering VIA chat / API. 500+ LLMs graded.

Recent Activity

replied to their post about 12 hours ago

Tiny but mighty: LFM 1.2B - 11 Distill / Fine tunes : Exceeding all benchmarks at 300-700+ T/S on GPU, 60+ T/S CPU. Almost all exceed LFM 1.2B Benchmarks - which are already very impressive. All benchmarks posted. A specialized merge of multiple of these fine tunes by @nightmedia FAR exceeds the benchmarks set by the already impressive LFM. (LFM2.5-1.2B-MEGABRAIN-Thinking-Polaris-ClaudeHOPUS-Deepseek-GLM) Included are GLM 4.7 Flash, DeepSeek, Claude, Kimi V2 and other distill fine tunes. Here is the collection ( Quants by MRadermarcher). https://huggingface.co/collections/DavidAU/lfm-12b-sota-400-700-t-s-enhanced-fine-tunes-distills

updated a collection 1 day ago

Thinking / Reasoning Models - Reg and MOEs.

updated a collection 1 day ago

Qwen3 - 30B-A3B (128 experts) and higher

View all activity

Organizations

None yet

replied to their post about 12 hours ago

Waiting for updates W HEretic/Transformers to make this possible with "thinking" LFM base.

replied to their post 6 days ago

For each model ; the quants are listed under quantizations.

posted an update 7 days ago

Post

5408

Tiny but mighty: LFM 1.2B - 11 Distill / Fine tunes : Exceeding all benchmarks at 300-700+ T/S on GPU, 60+ T/S CPU.

Almost all exceed LFM 1.2B Benchmarks - which are already very impressive.
All benchmarks posted.

A specialized merge of multiple of these fine tunes by @nightmedia FAR exceeds the benchmarks set by the already impressive LFM.

(LFM2.5-1.2B-MEGABRAIN-Thinking-Polaris-ClaudeHOPUS-Deepseek-GLM)

Included are GLM 4.7 Flash, DeepSeek, Claude, Kimi V2 and other distill fine tunes.

Here is the collection ( Quants by MRadermarcher).

https://huggingface.co/collections/DavidAU/lfm-12b-sota-400-700-t-s-enhanced-fine-tunes-distills

4 replies

posted an update 22 days ago

Post

6675

Uncensored, Heretic GGUF quants of GLM 4.7 (30B-A3B) with correct Llamacpp and all updates ; NEO-CODE Imatrix W 16 bit OTs.

Also specialized quants (balanced for this model), and all quants are NEO-CODE Imatrix W 16 bit output tensor.

DavidAU/GLM-4.7-Flash-Uncensored-Heretic-NEO-CODE-Imatrix-MAX-GGUF

"Reg quants, non-heretic" :

Also 16 bit ot, NEO-CODE Imatrix and specialized:

DavidAU/GLM-4.7-Flash-NEO-CODE-Imatrix-MAX-GGUF

replied to their post 28 days ago

Hey;

I am currently restricting access to the source presently due to past issues with abuse of the source (of my models), which lead to community issues due to non-disclosure of tech details of the model as well as issues related to non-attribution of multiple parties.

I may release it in a few weeks.

posted an update about 2 months ago

Post

9598

SAVANT COMMANDER: 48B-A4B , 256k Context, GATED MOE.

I am going to showcase some other people's tuning work, that I have put into a GATED Distill MOE (Qwen3) ; 256 K context. Special thanks to all the tuners (listed in the model tree and repo page with special shoutout to "TeichAI" - using Unsloth for a lot of the Distills in this model):

Savant Commander is a specialized MOE model that allows you to control which expert(s) (of 12) are assigned to your use case(s) / prompt(s) ... directly (by name(s)), as opposed to having the "choices" made for you.

The model is composed of 12 DISTILLS (compressed 12x4B MOE) of top closed (GPT5.1, OpenAI 120 GPT Oss, Gemini (3), Claude (2) ) and open source models (Kimi, GLM, Deepseek, Command-A, JanV1 ) all in one.

256k Context, 2 experts activated.

PS: There is also a "heretic" / "decensored" version too ; listed on this model page.

DavidAU/Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF

3 replies

replied to their post 3 months ago

The 80Bs will soon be on the docket.

Issue with ablits -; may be some losses in brain dept so to speak.

Ablits are much better than they used to be, but when it comes to tuning - can be a bit of a nightmare.

replied to their post 3 months ago

@salsaman

Received your message, you can contact me via discord:
David_AU [note underscore]

Or open/setup a model on your Hugging face and I will contact you via community tab there if you prefer.

posted an update 4 months ago

Post

14208

*** Happy Halloween - Embrace the Horror ! ***

Unsloth fine tunes using in house horror dataset.

Gemma 3 - 1B, 4B, two 12Bs and 27B (uploaded yesterday)
Qwen 3 - 1.7B [two] - new today... and , 4B, 6B, 42B ...

And 32 MORE horror models:

https://huggingface.co/DavidAU/models?search=horror

Collection:

https://huggingface.co/collections/DavidAU/grand-horror-165b-horror-and-fiction-generation

Enjoy ;

6 replies

replied to hoteo's post 4 months ago

Hey;

I am DavidAU
[ https://huggingface.co/DavidAU ]

Also in the fine Perth, WA area ; please take a look at my repo and see if there is something we have in common. (?)

Your desc of "Algorithm in progress" is not too clear.
I build models including quants, merges and fine tunes.

You can contact me on DISCORD too:
David_AU

Cheers.
David

replied to mlabonne's post 7 months ago

@pmshaikh

Try:
https://huggingface.co/google/medgemma-27b-it
and/or
gemma3 27B model - strong in math.

Link to quants on these repo page.

Also; Qwen's repo -> MATH model(s).

You can reach me via the "Community tabs" at any of the model repos here:
https://huggingface.co/DavidAU

replied to bartowski's post over 1 year ago

Hear you there. In my cases, the issue is source -> Gguf.
Which source and outfile config = best quality gguf.

For creative uses cases the cumulative errors do add up, and add up to different - but nuanced results.

For some of my experiments these "rounding errors" are the target to improve output.

That being said, it can also lead to improves (or not) in logic/problem solving too. This is not completely understood, but an observation from testing.

reacted to singhsidhukuldeep's post with 🚀 over 1 year ago

Post

1861

✨ Feeling thankful...

🇮🇳 15th August, 2024; on India's 78th Independence Day

🎉 Crossed 100 followers on Hugging Face

🏆 Got LinkedIn Top Voice

🤖 AI has never been more exciting and I am here for it

👀 @clem Can I be a Hugging Face fellow now?

replied to bartowski's post over 1 year ago

I do not expect meaningful differences between FP16, BF16, and FP32 or the models derived from them and so far I have not seen any evidence to the contrary either.

There is a difference when running test prompts (ie Q4KM), at temp 0 for all three, depending on:

1 - Org source "fp"
2 - Outfile settings - fp16,fp32, or bf16.

Although it is minor in PPL differences, it does show when using a test prompt.
There are word changes, sentence changes and the like.
On longer gen, conclusions change as well.

It is not a big contrast, but it does show when testing this way.

reacted to mlabonne's post with ❤️ almost 2 years ago

Post

9516

⚡ AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

💻 AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4

19 replies

replied to mlabonne's post almost 2 years ago

Fantastic - thanks so much for sharing. Only a couple 1000 models I want to quant! Using GGUF -my-repo at the moment (a space) :

https://huggingface.co/spaces/ggml-org/gguf-my-repo

Have you or do you know of any ways to use the same COLAB type method (or space or other) to make GGUFs with Imatrix ?

David Belton PRO

AI & ML interests

Recent Activity

Organizations

DavidAU's activity