reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Ripple Effect: On Unforeseen Complications of Backdoor Attacks

Authors: Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments using 4 prominent PTLMs and 16 text classification benchmark datasets, we demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs.
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering (School of Cyber Security), University of Electronic Science and Technology of China 2Flexera 3CISPA Helmholtz Center for Information Security.
Pseudocode	No	The paper describes the methodology in narrative text and diagrams (Figure 1, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/zhangrui4041/Backdo or Complications.
Open Datasets	Yes	Datasets. We adopt 5 widely used text classification datasets to conduct our experiments, including IMDb (Maas et al., 2011), AGNews (AG) (Zhang et al., 2015), MultiDimensional Gender Bias (MGB) (Dinan et al., 2020), DBPedia (Zhang et al., 2015), and Corpus of Linguistic Acceptability (Co LA) (Warstadt et al., 2018). In addition to the 5 datasets in Section 3.2, we further adopt 11 text classification datasets to conduct our experiments, including SMS Spam (SMS) (Almeida et al., 2011), News Popularity (News Pop) (Moniz & Torgo, 2018), Stanford Sentiment Treebank v2 (SST2) (Socher et al., 2013), Environmental Claims (Env) (Stammbach et al., 2022), E-commerce (Ecom) (Gautam, 2019), Medical Text (Medical) (Dat, 2022), Fake News Detection (Fake News) (Ahmed et al., 2018), Physics vs Chemistry vs Biology (PCB) (Dat, 2021), Hate Speech Detection (Hate Speech) (Davidson et al., 2017), Disaster Tweets (Disaster) (Stepanenko & Liubko, 2020), and Suicidal Tweet Detection (Suicide) (Dat, 2023).
Dataset Splits	Yes	For SST2 and Env datasets, we use their existing training/testing split. For the rest, we use 80%/20% training/testing split.
Hardware Specification	No	The paper mentions 'limited memory and GPU hours' as a general resource constraint, but does not provide specific details on the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting experiments.
Software Dependencies	No	The paper mentions using specific models like BERT, BART, GPT-2, and T5, and refers to Huggingface as a source. However, it does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	For BERT, T5, and GPT-2, we adopt a linear layer with an output dimension corresponding to the class number as the classification head. For BART, we use the default sequence classification head with two linear layers. [...] In our evaluation, we maintain a poisoning rate of 0.01 and update all parameters to construct backdoored PTLMs. [...] We configure the poisoning rate to 0.1 and employ an α of 0.4.