Request: Nanbeige4.1-3B
Nanbeige 4.1 is a tiny 3B model that reasons like Qwen 32B. It is a bit buggy but is probably still worth Hereticising. It would be interesting to see whether it needs special tuning like GPT-OSS for decensoring.
I'll make one just in case. Also working on the RP models. Do you have any suggestions for setting up SillyTavern (templates, samples, etc.), btw?
This thing is terrible.
I never liked SillyTavern. I use Agent Swapper for roleplay with Ollama as the backend. However most of my testing is with story writing.
Thank you!
We gotta figure out that jinja. It's terrible.
That is a good find. I think it was done without MPOA enabled. Can @megabytes please tell us?
Hello @redaihf !Nanbeige4.1-3B-heretic was in fact done with MPOA enabled and row normalization set to full.
While I released it a few days before Heretic 1.2.0's proper release, I had been running heretic from the git repo directly, not the release build, so I had access to MPOA 'early'. Nothing related to that code was changed in-between the time of the model's release and Heretic's 1.2.0 release, so it's the same as it would be had I made it today.
I also see now that MuXodious has released their own abliteration that has a slightly lower KD divergence and refusal rate.
Use that one! I've updated my models to point to it as well.
» [Trial 31] Refusals: 0/100, KL divergence: 0.0004 Nanbeige config #1
[Trial 99] Refusals: 1/100, KL divergence: 0.0003
[Trial 81] Refusals: 3/100, KL divergence: 0.0001
[Trial 84] Refusals: 4/100, KL divergence: 0.0001
[Trial 19] Refusals: 7/100, KL divergence: 0.0001
[Trial 14] Refusals: 27/100, KL divergence: 0.0000
[Trial 97] Refusals: 38/100, KL divergence: 0.0000
[Trial 129] Refusals: 50/100, KL divergence: 0.0000
[Trial 141] Refusals: 94/100, KL divergence: 0.0000
[Trial 124] Refusals: 95/100, KL divergence: 0.0000
[Trial 9] Refusals: 97/100, KL divergence: 0.0000
One of these runs could have been better, if hadn't forgot to save the checkpoint and the config... Given the margins, our heretic incantations should perform the same. Good work, MPOA was evident in your results.