reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks

Authors: Danni Yuan, Mingda Zhang, Shaokui Wei, Li Liu, Baoyuan Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments under various settings of backdoor attacks demonstrate the superior detection performance of the proposed method to existing poisoned detection approaches according to sample activation-based metrics. Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch)
Researcher Affiliation	Academia	1School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 2 The Hong Kong University of Science and Technology (Guangzhou) EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Filtering out poisoned samples within the identified target class(es).
Open Source Code	Yes	Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch)
Open Datasets	Yes	We use CIFAR-10 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015) as primary datasets to evaluate the detection performance. Additionally, we expand our evaluation to the datasets that are closer to real-world scenarios, such as Image Net (Deng et al., 2009) subset (200 classes), DTD (Cimpoi et al., 2014), and GTSRB (Houben et al., 2013)
Dataset Splits	Yes	The poisoning ratio in our main evaluation is 10% for non-clean label attacks and 5% for clean label attacks. The target label t is set to 0 for all-to-one backdoor attack, while target labels are set to t = (y + 1) mod K for all-to-all backdoor attack. The detailed experimental setting are provided in Appendix B.3. For a fair comparison, we maintain that the number of clean samples per class is 10, extracted from the test dataset.
Hardware Specification	Yes	Tab. 17 illustrates the computation complexity and time (based on RTX A5000 GPU) of AGPD and the compared detection method under eight backdoor attacks with 10% poisoning ratio on CIFAR-10.
Software Dependencies	No	The paper mentions 'PyTorch' in the abstract, but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	The threshold used in AGPD τz and τs are e2 and 0.05, respectively. Table 6: The common hyperparameters for training across five datasets. Dataset: CIFAR-10, Epoch: 100, Learning rate: 0.01, Batch size: 128, Optimizer: SGD.