| --- |
| license: apache-2.0 |
| --- |
| |
| <div style="text-align:center;"> |
| <strong>Safety classifier for Detoxifying Large Language Models via Knowledge Editing</strong> |
| </div> |
| |
| # 💻 Usage |
|
|
| ```shell |
| from transformers import RobertaForSequenceClassification, RobertaTokenizer |
| safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier' |
| safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir) |
| safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir) |
| ``` |
| You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path. |
|
|
|
|
| # 📖 Citation |
|
|
| If you use our work, please cite our paper: |
|
|
| ```bibtex |
| @misc{wang2024SafeEdit, |
| title={Detoxifying Large Language Models via Knowledge Editing}, |
| author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen}, |
| year={2024}, |
| eprint={2403.14472}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL} |
| url={https://arxiv.org/abs/2403.14472}, |
| |
| } |
| ``` |
|
|