reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Authors: Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, Chiyuan Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose white-box and black-box attack algorithms and evaluate them through a case study on image classification tasks using the CIFAR-10 and Image Net datasets, targeting a family of widely used unlearning methods. Our results show extremely poor test accuracy following the attack 3.6% on CIFAR-10 and 0.4% on Image Net for white-box attacks, and 8.5% on CIFAR-10 and 1.3% on Image Net for black-box attacks.
Researcher Affiliation	Collaboration	Yangsibo Huang1,2 Daogao Liu3 Lynn Chua1 Badih Ghazi1 Pritish Kamath1 Ravi Kumar1 Pasin Manurangsi1 Milad Nasr1 Amer Sinha1 Chiyuan Zhang1 1Google 2Princeton University 3University of Washington
Pseudocode	Yes	Algorithm 1 White-box attack. Algorithm 2 Black-box attack. Algorithm 3 Black-box attack for Image Net
Open Source Code	Yes	Code is available at https://github.com/daogaoliu/unlearning-under-adversary.
Open Datasets	Yes	We evaluate the proposed attack on image classification tasks... CIFAR-10: We use the model provided by the Machine Unlearning Challenge at NeurIPS 2023... ResNet-18 (He et al., 2016) trained on the CIFAR-10 dataset (Krizhevsky et al., 2009). Image Net: ...ResNeXt-50 model (Xie et al., 2017) pretrained on Image Net (Deng et al., 2009).
Dataset Splits	Yes	We randomly select examples from the CIFAR-10 training set to form the forget set Dforget, while the rest of the training data forms the retain set Dretain. The CIFAR-10 test set is used as Dholdout. Similar to CIFAR-10, we randomly select examples from the Image Net training set to create the forget set Dforget, with the remaining data forms Dretain. The Image Net test set is used as Dholdout.
Hardware Specification	Yes	We conduct all the experiments on NVIDIA A100-64GB GPU cards with 4 CPUs.
Software Dependencies	No	In this paper, we use the higher library designed for PyTorch to implement this computation.
Experiment Setup	Yes	For all unlearning algorithms, we use SGD as the optimizer, with a momentum of 0.9 and a weight decay of 5e-4. The (un)learning rate is set to 0.02 for CIFAR-10 and 0.05 for Image Net. Each unlearning process is run with a batch size of 128 for a single epoch.3 Table 8: Hyperparameter settings for white-box and black-box attacks used in our experiments.