Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Authors: Danni Yuan, Mingda Zhang, Shaokui Wei, Li Liu, Baoyuan Wu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments under various settings of backdoor attacks demonstrate the superior detection performance of the proposed method to existing poisoned detection approaches according to sample activation-based metrics. Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch) |
| Researcher Affiliation | Academia | 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 2 The Hong Kong University of Science and Technology (Guangzhou) EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Filtering out poisoned samples within the identified target class(es). |
| Open Source Code | Yes | Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch) |
| Open Datasets | Yes | We use CIFAR-10 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015) as primary datasets to evaluate the detection performance. Additionally, we expand our evaluation to the datasets that are closer to real-world scenarios, such as Image Net (Deng et al., 2009) subset (200 classes), DTD (Cimpoi et al., 2014), and GTSRB (Houben et al., 2013) |
| Dataset Splits | Yes | The poisoning ratio in our main evaluation is 10% for non-clean label attacks and 5% for clean label attacks. The target label t is set to 0 for all-to-one backdoor attack, while target labels are set to t = (y + 1) mod K for all-to-all backdoor attack. The detailed experimental setting are provided in Appendix B.3. For a fair comparison, we maintain that the number of clean samples per class is 10, extracted from the test dataset. |
| Hardware Specification | Yes | Tab. 17 illustrates the computation complexity and time (based on RTX A5000 GPU) of AGPD and the compared detection method under eight backdoor attacks with 10% poisoning ratio on CIFAR-10. |
| Software Dependencies | No | The paper mentions 'PyTorch' in the abstract, but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The threshold used in AGPD τz and τs are e2 and 0.05, respectively. Table 6: The common hyperparameters for training across five datasets. Dataset: CIFAR-10, Epoch: 100, Learning rate: 0.01, Batch size: 128, Optimizer: SGD. |