reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

APBench: A Unified Availability Poisoning Attack and Defenses Benchmark

Authors: Tianrui Qin, Xitong Gao, Juanjuan Zhao, Kejiang Ye, Cheng-zhong Xu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To further evaluate the attack and defense capabilities of these poisoning methods, we have developed a benchmark APBench for assessing the efﬁcacy of adversarial poisoning. APBench consists of 11 state-of-the-art availability poisoning attacks, 8 defense algorithms, and 4 conventional data augmentation techniques. We also have set up experiments with varying different poisoning ratios, and evaluated the attacks on multiple datasets and their transferability across model architectures. We further conducted a comprehensive evaluation of 2 additional attacks speciﬁcally targeting unsupervised models. Our results reveal the glaring inadequacy of existing attacks in safeguarding individual privacy.
Researcher Affiliation	Collaboration	a Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. b University of Chinese Academy of Sciences, Beijing, China. c Shenzhen University of Advanced Technology, Shenzhen, China. d Tencent Security Platform, Tencent, Shenzhen, China. e University of Macau, Macau S.A.R., China.
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the paper. The methods are described in narrative text.
Open Source Code	Yes	APBench is open source and available to the deep learning community1. 1https://github.com/lafeat/apbench. We provide an open-source implementation of all attacks and defenses in the supplementary material.
Open Datasets	Yes	We evaluated our benchmark on 4 commonly datasets (CIFAR-10 (Krizhevsky et al., 2009), CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and an Image Net (Deng et al., 2009) subset)
Dataset Splits	Yes	Table 12: Dataset speciﬁcations and the respective test accuracies on Res Net-18. Datasets #Classes Training / Test Size Image Dimensions Clean Accuracy (%) CIFAR-10 (Krizhevsky et al., 2009) 10 50,000 / 10,000 32 32 3 94.32 CIFAR-100 (Krizhevsky et al., 2009) 100 50,000 / 10,000 32 32 3 75.36 SVHN (Netzer et al., 2011) 10 73,257 / 26,032 32 32 3 96.03 Image Net-subset (Deng et al., 2009) 100 20,000 / 4,000 224 224 3 64.18
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	Table 18 lists "Py Torch BSD Git Hub: pytorch/pytorch" as a codebase used, but it does not specify a version number for PyTorch or any other software components.
Experiment Setup	Yes	We trained the CIAFR-10, CIFAR-100 and Image Net-subset models for 200 epochs and the SVHN models for 100 epochs. We used the stochastic gradient descent (SGD) optimizer with a momentum of 0.9 and a learning rate of 0.1 by default. As for unsupervised learning, all experiments are trained for 500 epochs with the SGD optimizer. The learning rate is 0.5 for Sim CLR (Chen et al., 2020a) and 0.3 for Mo Co-v2 (Chen et al., 2020b). Please note that we generate sample-wise perturbations for all APAs. Speciﬁc settings for each defense method may have slight differences, and detailed information can be found in the Appendix A. Table 14: Default training hyperparameter settings. Table 15: Default hyperparameter settings of defenses.