This is the fixed version

#1
by bartowski - opened

@winterquark @kth8 deleting the other one (typo in name also got fixed), this one should be working

Thanks.

I tried Q6_K_L. Unfortunately this too runs into bad incohrent runaway generation.

I tested with the non-censored version, which runs really well. But this does not.

So I downloaded this model from here, https://huggingface.co/SicariusSicariiStuff/Qwen2.5-14B_Uncensored_Instruct after the fix, and made a gguf and Q6 quantization myself to doublecheck, sir. My own quantized model gives me exactly the same problem.

I humbly posit, something is wrong then with probably the original model, or llama.cpp, or how I am using it.
I lack the knowledge to identify the issue as my knowledge on these topics is skin deep. I am sorry if I missed something important, please forgive me.

It may be that this model also requires this incoming fix that is for FIM token support:

https://github.com/ggerganov/llama.cpp/pull/9609

Maybe qwen2.5 just does weird stuff

Is there any plan to fix the problem? I tried several models and they all had EOG token problem. The response stream won't stop.

I believe the problem was with the original https://huggingface.co/SicariusSicariiStuff/Qwen2.5-14B_Uncensored_Instruct (link now defunct).

I quantized them myself with the same results.

And I checked, this is the only attempt at uncensoring the 14B model.
This explains why all of the quantized models have the same runaway problem.

I have told the publisher of the model, who is no doubt far more intelligent than I to have attempted to uncensor it, during the same time as my last post here. He has taken it offline since then (hence the link is now defunct). I do not know if his time will permit him to create another one or diagnose the problem.

I think we will need to wait until someone as knowledgeable makes another attempt to uncensor it. As I do not know anyone in this field I can't say if anyone else is trying that at the moment.

So this explains why I couldn't make it work.

For my the model is unusable. I searched simply a model without mind training wheels and dysfunctional censorship. (no extras for xxx etc...)
But it is find tuned with extras ("added some RP data ... towards aggressiveness"), and the original page is gone.
From the original:
"This is the instruct model, except my uncensoring protocol I also added some RP data, Story writing and tried to nudge the model towards aggressiveness, for more spicy RP and story bias. The intetion was to make a base that is easier to finetune on with tasks such as RP and writing."

Such manipulations should be marked or documented, and the original page is gone. The model acts manipulative and sometimes completely dysfunctional.

(I am new to it, if this is usual for "Uncensored" models to do such fine tunes, tell me...)

Super late to the party here, any chance to get an exl2 version? or maybe im just failing miserably at finding a more recent version with the hugging face search.

Sign up or log in to comment