Soft-Label Integration for Robust Toxicity Classification

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Shuo Han, Xin-Qiang Cai, Xinyu Xing

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.
Researcher Affiliation Academia Zelei Cheng Northwestern University Evanston, USA EMAIL Xian Wu Northwestern University Evanston, USA EMAIL Jiahao Yu Northwestern University Evanston, USA EMAIL Shuo Han Northwestern University Evanston, USA EMAIL Xin-Qiang Cai The University of Tokyo Tokyo, Japan EMAIL Xinyu Xing Northwestern University Evanston, USA EMAIL
Pseudocode Yes We provide the full algorithm in Algorithm 1.
Open Source Code Yes We release the data and code in https://github. com/chengzelei/crowdsource_toxicity_classification.
Open Datasets Yes Additionally, we conduct our experiments on the public Hate Xplain dataset [20].
Dataset Splits Yes For each classification task, we have a large training set with crowdsourced annotations (i.e., 6,941 samples for toxic question classification and 28,194 samples for toxic response classification) and a testing set containing 2,000 samples with ground truth. The validation set with ground truth includes a small number of samples (i.e., 1,000 samples) from the training set.
Hardware Specification Yes We train the machine learning models on a server with 8 NVIDIA A100 80GB GPUs and 4TB memory for all the learning algorithms.
Software Dependencies Yes The toxicity classifier and soft-label weight estimator are both implemented based on the transformers library of version 4.34.1 [62].
Experiment Setup Yes We list the hyper-parameter settings for all experiments in Appendix C.3.