The Ripple Effect: On Unforeseen Complications of Backdoor Attacks

Authors: Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments using 4 prominent PTLMs and 16 text classification benchmark datasets, we demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs.
Researcher Affiliation Collaboration 1School of Computer Science and Engineering (School of Cyber Security), University of Electronic Science and Technology of China 2Flexera 3CISPA Helmholtz Center for Information Security.
Pseudocode No The paper describes the methodology in narrative text and diagrams (Figure 1, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/zhangrui4041/Backdo or Complications.
Open Datasets Yes Datasets. We adopt 5 widely used text classification datasets to conduct our experiments, including IMDb (Maas et al., 2011), AGNews (AG) (Zhang et al., 2015), MultiDimensional Gender Bias (MGB) (Dinan et al., 2020), DBPedia (Zhang et al., 2015), and Corpus of Linguistic Acceptability (Co LA) (Warstadt et al., 2018). In addition to the 5 datasets in Section 3.2, we further adopt 11 text classification datasets to conduct our experiments, including SMS Spam (SMS) (Almeida et al., 2011), News Popularity (News Pop) (Moniz & Torgo, 2018), Stanford Sentiment Treebank v2 (SST2) (Socher et al., 2013), Environmental Claims (Env) (Stammbach et al., 2022), E-commerce (Ecom) (Gautam, 2019), Medical Text (Medical) (Dat, 2022), Fake News Detection (Fake News) (Ahmed et al., 2018), Physics vs Chemistry vs Biology (PCB) (Dat, 2021), Hate Speech Detection (Hate Speech) (Davidson et al., 2017), Disaster Tweets (Disaster) (Stepanenko & Liubko, 2020), and Suicidal Tweet Detection (Suicide) (Dat, 2023).
Dataset Splits Yes For SST2 and Env datasets, we use their existing training/testing split. For the rest, we use 80%/20% training/testing split.
Hardware Specification No The paper mentions 'limited memory and GPU hours' as a general resource constraint, but does not provide specific details on the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting experiments.
Software Dependencies No The paper mentions using specific models like BERT, BART, GPT-2, and T5, and refers to Huggingface as a source. However, it does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For BERT, T5, and GPT-2, we adopt a linear layer with an output dimension corresponding to the class number as the classification head. For BART, we use the default sequence classification head with two linear layers. [...] In our evaluation, we maintain a poisoning rate of 0.01 and update all parameters to construct backdoored PTLMs. [...] We configure the poisoning rate to 0.1 and employ an α of 0.4.