reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flexible, Efficient, and Stable Adversarial Attacks on Machine Unlearning

Authors: Zihan Zhou, Yang Zhou, Zijie Zhang, Lingjuan Lyu, Da Yan, Ruoming Jin, Dejing Dou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the attack success rate of our DDPA method on real datasets against state-of-the-art machine unlearning attack methods. Our source code is available at https://github.com/zzz0134/DDPA. Empirical evaluation on real datasets demonstrates the superior performance of our DDPA MU attack model against several state-of-the-art methods on image classiﬁcation. More experiments, implementation details, and hyperparameter setting are presented in Appendices F.
Researcher Affiliation	Collaboration	1Auburn University, USA 2University of Texas at San Antonio, USA 3Sony AI, Japan 4Indiana University Bloomington, USA 5Kent State University, USA 6Fudan University, China 7BEDI Cloud, China. Correspondence to: Yang Zhou <EMAIL>.
Pseudocode	Yes	By assembling different pieces together, we provide the pseudo code of our DDPA method in Algorithm 1 in Appendix D.
Open Source Code	Yes	Our source code is available at https://github.com/zzz0134/DDPA.
Open Datasets	Yes	Datasets and Models. We conduct experiments on two widely-used image classiﬁcation datasets and one sentiment classiﬁcation dataset: CIFAR-100 (Krizhevsky, 2009), Tiny Image Net (Le & Yang, 2015), and SST-2 (Socher et al., 2013). The datasets are publicly available and are widely used for non-commercial research and educational purposes.
Dataset Splits	Yes	For CIFAR-100, we use 50,000 examples for training and 10,000 examples for testing, training a VGG16 model for image classiﬁcation over 150 epochs. On Tiny Image Net, we use 100,000 examples for training and 10,000 examples for testing, training a Res Net-18 model for image classiﬁcation over 150 epochs. For SST-2, we use 20,000 examples for training and 872 examples for testing, ﬁne-tuning a LLa MA-3B model with Lo RA for sentiment classiﬁcation over 10 epochs.
Hardware Specification	Yes	The experiments were conducted on a compute server running on Red Hat Enterprise Linux 7.2 with 2 CPUs of Intel Xeon E5-2650 v4 (at 2.66 GHz) and 8 GPUs of NVIDIA Ge Force GTX 2080 Ti (with 11 GB of GDDR6 on a 352-bit memory bus and memory bandwidth in the neighborhood of 620GB/s) and 4 GPUs of NVIDIA H100 (each with 80GB of HBM2e memory on a 5120-bit memory bus, offering a memory bandwidth of approximately 3TB/s),256GB of RAM, and 1TB of HDD.
Software Dependencies	Yes	The codes were implemented in Python 3.7.10 and Py Torch 1.9.0.
Experiment Setup	Yes	All neural networks are trained using SGD optimization, starting with an initial learning rate of 0.1 and a batch size of 64. For the image datasets, CIFAR-100 and Tiny Image Net, all models were trained for 150 epochs using a batch size of 128 and a learning rate of 0.1. For the sentiment dataset SST-2, all models were trained for 50 epochs with a batch size of 8 and a learning rate of 4e-4. Unless otherwise explicitly stated, we used the following default parameter settings in the experiments. As shown in Table 4.