The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
Authors: Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments using 4 prominent PTLMs and 16 text classification benchmark datasets, we demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Engineering (School of Cyber Security), University of Electronic Science and Technology of China 2Flexera 3CISPA Helmholtz Center for Information Security. |
| Pseudocode | No | The paper describes the methodology in narrative text and diagrams (Figure 1, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/zhangrui4041/Backdo or Complications. |
| Open Datasets | Yes | Datasets. We adopt 5 widely used text classification datasets to conduct our experiments, including IMDb (Maas et al., 2011), AGNews (AG) (Zhang et al., 2015), MultiDimensional Gender Bias (MGB) (Dinan et al., 2020), DBPedia (Zhang et al., 2015), and Corpus of Linguistic Acceptability (Co LA) (Warstadt et al., 2018). In addition to the 5 datasets in Section 3.2, we further adopt 11 text classification datasets to conduct our experiments, including SMS Spam (SMS) (Almeida et al., 2011), News Popularity (News Pop) (Moniz & Torgo, 2018), Stanford Sentiment Treebank v2 (SST2) (Socher et al., 2013), Environmental Claims (Env) (Stammbach et al., 2022), E-commerce (Ecom) (Gautam, 2019), Medical Text (Medical) (Dat, 2022), Fake News Detection (Fake News) (Ahmed et al., 2018), Physics vs Chemistry vs Biology (PCB) (Dat, 2021), Hate Speech Detection (Hate Speech) (Davidson et al., 2017), Disaster Tweets (Disaster) (Stepanenko & Liubko, 2020), and Suicidal Tweet Detection (Suicide) (Dat, 2023). |
| Dataset Splits | Yes | For SST2 and Env datasets, we use their existing training/testing split. For the rest, we use 80%/20% training/testing split. |
| Hardware Specification | No | The paper mentions 'limited memory and GPU hours' as a general resource constraint, but does not provide specific details on the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting experiments. |
| Software Dependencies | No | The paper mentions using specific models like BERT, BART, GPT-2, and T5, and refers to Huggingface as a source. However, it does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | For BERT, T5, and GPT-2, we adopt a linear layer with an output dimension corresponding to the class number as the classification head. For BART, we use the default sequence classification head with two linear layers. [...] In our evaluation, we maintain a poisoning rate of 0.01 and update all parameters to construct backdoored PTLMs. [...] We configure the poisoning rate to 0.1 and employ an α of 0.4. |