Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Authors: Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, Chiyuan Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose white-box and black-box attack algorithms and evaluate them through a case study on image classification tasks using the CIFAR-10 and Image Net datasets, targeting a family of widely used unlearning methods. Our results show extremely poor test accuracy following the attack 3.6% on CIFAR-10 and 0.4% on Image Net for white-box attacks, and 8.5% on CIFAR-10 and 1.3% on Image Net for black-box attacks.
Researcher Affiliation Collaboration Yangsibo Huang1,2 Daogao Liu3 Lynn Chua1 Badih Ghazi1 Pritish Kamath1 Ravi Kumar1 Pasin Manurangsi1 Milad Nasr1 Amer Sinha1 Chiyuan Zhang1 1Google 2Princeton University 3University of Washington
Pseudocode Yes Algorithm 1 White-box attack. Algorithm 2 Black-box attack. Algorithm 3 Black-box attack for Image Net
Open Source Code Yes Code is available at https://github.com/daogaoliu/unlearning-under-adversary.
Open Datasets Yes We evaluate the proposed attack on image classification tasks... CIFAR-10: We use the model provided by the Machine Unlearning Challenge at NeurIPS 2023... ResNet-18 (He et al., 2016) trained on the CIFAR-10 dataset (Krizhevsky et al., 2009). Image Net: ...ResNeXt-50 model (Xie et al., 2017) pretrained on Image Net (Deng et al., 2009).
Dataset Splits Yes We randomly select examples from the CIFAR-10 training set to form the forget set Dforget, while the rest of the training data forms the retain set Dretain. The CIFAR-10 test set is used as Dholdout. Similar to CIFAR-10, we randomly select examples from the Image Net training set to create the forget set Dforget, with the remaining data forms Dretain. The Image Net test set is used as Dholdout.
Hardware Specification Yes We conduct all the experiments on NVIDIA A100-64GB GPU cards with 4 CPUs.
Software Dependencies No In this paper, we use the higher library designed for PyTorch to implement this computation.
Experiment Setup Yes For all unlearning algorithms, we use SGD as the optimizer, with a momentum of 0.9 and a weight decay of 5e-4. The (un)learning rate is set to 0.02 for CIFAR-10 and 0.05 for Image Net. Each unlearning process is run with a batch size of 128 for a single epoch.3 Table 8: Hyperparameter settings for white-box and black-box attacks used in our experiments.