Request: Nanbeige4.1-3B

#8
by redaihf - opened

Nanbeige 4.1 is a tiny 3B model that reasons like Qwen 32B. It is a bit buggy but is probably still worth Hereticising. It would be interesting to see whether it needs special tuning like GPT-OSS for decensoring.

There exists hereticated Nanbeige: https://huggingface.co/megabytes/Nanbeige4.1-3B-heretic

That is a good find. I think it was done without MPOA enabled. Can @megabytes please tell us?

I'll make one just in case. Also working on the RP models. Do you have any suggestions for setting up SillyTavern (templates, samples, etc.), btw?

This thing is terrible.

I never liked SillyTavern. I use Agent Swapper for roleplay with Ollama as the backend. However most of my testing is with story writing.

Thank you!

redaihf changed discussion status to closed

We gotta figure out that jinja. It's terrible.

That is a good find. I think it was done without MPOA enabled. Can @megabytes please tell us?

Hello @redaihf !Nanbeige4.1-3B-heretic was in fact done with MPOA enabled and row normalization set to full.

While I released it a few days before Heretic 1.2.0's proper release, I had been running heretic from the git repo directly, not the release build, so I had access to MPOA 'early'. Nothing related to that code was changed in-between the time of the model's release and Heretic's 1.2.0 release, so it's the same as it would be had I made it today.

I also see now that MuXodious has released their own abliteration that has a slightly lower KD divergence and refusal rate.

Use that one! I've updated my models to point to it as well.

 » [Trial  31] Refusals:  0/100, KL divergence: 0.0004  Nanbeige config #1
   [Trial  99] Refusals:  1/100, KL divergence: 0.0003
   [Trial  81] Refusals:  3/100, KL divergence: 0.0001
   [Trial  84] Refusals:  4/100, KL divergence: 0.0001
   [Trial  19] Refusals:  7/100, KL divergence: 0.0001
   [Trial  14] Refusals: 27/100, KL divergence: 0.0000
   [Trial  97] Refusals: 38/100, KL divergence: 0.0000
   [Trial 129] Refusals: 50/100, KL divergence: 0.0000
   [Trial 141] Refusals: 94/100, KL divergence: 0.0000
   [Trial 124] Refusals: 95/100, KL divergence: 0.0000
   [Trial   9] Refusals: 97/100, KL divergence: 0.0000

One of these runs could have been better, if hadn't forgot to save the checkpoint and the config... Given the margins, our heretic incantations should perform the same. Good work, MPOA was evident in your results.

Sign up or log in to comment