Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks

Authors: Danni Yuan, Mingda Zhang, Shaokui Wei, Li Liu, Baoyuan Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments under various settings of backdoor attacks demonstrate the superior detection performance of the proposed method to existing poisoned detection approaches according to sample activation-based metrics. Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch)
Researcher Affiliation Academia 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 2 The Hong Kong University of Science and Technology (Guangzhou) EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1 Filtering out poisoned samples within the identified target class(es).
Open Source Code Yes Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch)
Open Datasets Yes We use CIFAR-10 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015) as primary datasets to evaluate the detection performance. Additionally, we expand our evaluation to the datasets that are closer to real-world scenarios, such as Image Net (Deng et al., 2009) subset (200 classes), DTD (Cimpoi et al., 2014), and GTSRB (Houben et al., 2013)
Dataset Splits Yes The poisoning ratio in our main evaluation is 10% for non-clean label attacks and 5% for clean label attacks. The target label t is set to 0 for all-to-one backdoor attack, while target labels are set to t = (y + 1) mod K for all-to-all backdoor attack. The detailed experimental setting are provided in Appendix B.3. For a fair comparison, we maintain that the number of clean samples per class is 10, extracted from the test dataset.
Hardware Specification Yes Tab. 17 illustrates the computation complexity and time (based on RTX A5000 GPU) of AGPD and the compared detection method under eight backdoor attacks with 10% poisoning ratio on CIFAR-10.
Software Dependencies No The paper mentions 'PyTorch' in the abstract, but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The threshold used in AGPD τz and τs are e2 and 0.05, respectively. Table 6: The common hyperparameters for training across five datasets. Dataset: CIFAR-10, Epoch: 100, Learning rate: 0.01, Batch size: 128, Optimizer: SGD.