Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

BadPrompt: Backdoor Attacks on Continuous Prompts

Authors: Xiangrui Cai, Haidong Xu, Sihan Xu, Ying ZHANG, Yuan xiaojie

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of Bad Prompt on five datasets and two continuous prompt models. The results exhibit the abilities of Bad Prompt to effectively attack continuous prompts while maintaining high performance on the clean test sets, outperforming the baseline models by a large margin.
Researcher Affiliation Academia Xiangrui Cai TKLNDST, TMCC College of Computer Science Nankai University EMAIL Haidong Xu TKLNDST College of Cyber Science Nankai University EMAIL Sihan Xu TKLNDST, TMCC College of Cyber Science Nankai University EMAIL Ying Zhang TMCC College of Computer Science Nankai University EMAIL Xiaojie Yuan TKLNDST, TMCC College of Cyber Science Nankai University EMAIL
Pseudocode No The paper describes the steps of its algorithms but does not provide them in a structured pseudocode block or an 'Algorithm' figure.
Open Source Code Yes The source code of Bad Prompt is publicly available 1. 1Project site: https://github.com/papers Papers/Bad Prompt
Open Datasets Yes The datasets used in the experiments are SST2 [42], MR [26], CR [9], SUBJ [27], and TREC [43], which have been widely-used in continuous prompts [7, 49].
Dataset Splits Yes Each class of the datasets has only 16 training samples and 16 validation samples respectively, which is a typical few-shot scenario.
Hardware Specification Yes We conducted all the experiments on 2 Ge Force RTX 3090 GPUs with AMD EPYC 7302 CPU.
Software Dependencies No The paper mentions models like 'Ro BERTa-large' and 'P-tuning' but does not specify version numbers for software dependencies or libraries (e.g., Python, PyTorch versions).
Experiment Setup Yes For the detailed settings of Bad Prompt and the baselines, please refer to Section 1.2 in Appendix. ... we vary the number of poisoning samples with N = {2, 4, 6, 8, 10}. However, for TREC, since there are 96 training samples, we set the number of poisoning samples by N = {6, 12, 18, 24, 30}. ... we also conduct experiments on five datasets (i.e., SST-2, MR, CR, SUBJ, and TREC) and two victim models (i.e., DART and P-tuning). Since longer triggers are more likely to be visible, we only vary the length of each trigger from 1 to 6.