Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Authors: Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, Chiyuan Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose white-box and black-box attack algorithms and evaluate them through a case study on image classification tasks using the CIFAR-10 and Image Net datasets, targeting a family of widely used unlearning methods. Our results show extremely poor test accuracy following the attack 3.6% on CIFAR-10 and 0.4% on Image Net for white-box attacks, and 8.5% on CIFAR-10 and 1.3% on Image Net for black-box attacks. |
| Researcher Affiliation | Collaboration | Yangsibo Huang1,2 Daogao Liu3 Lynn Chua1 Badih Ghazi1 Pritish Kamath1 Ravi Kumar1 Pasin Manurangsi1 Milad Nasr1 Amer Sinha1 Chiyuan Zhang1 1Google 2Princeton University 3University of Washington |
| Pseudocode | Yes | Algorithm 1 White-box attack. Algorithm 2 Black-box attack. Algorithm 3 Black-box attack for Image Net |
| Open Source Code | Yes | Code is available at https://github.com/daogaoliu/unlearning-under-adversary. |
| Open Datasets | Yes | We evaluate the proposed attack on image classification tasks... CIFAR-10: We use the model provided by the Machine Unlearning Challenge at NeurIPS 2023... ResNet-18 (He et al., 2016) trained on the CIFAR-10 dataset (Krizhevsky et al., 2009). Image Net: ...ResNeXt-50 model (Xie et al., 2017) pretrained on Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | We randomly select examples from the CIFAR-10 training set to form the forget set Dforget, while the rest of the training data forms the retain set Dretain. The CIFAR-10 test set is used as Dholdout. Similar to CIFAR-10, we randomly select examples from the Image Net training set to create the forget set Dforget, with the remaining data forms Dretain. The Image Net test set is used as Dholdout. |
| Hardware Specification | Yes | We conduct all the experiments on NVIDIA A100-64GB GPU cards with 4 CPUs. |
| Software Dependencies | No | In this paper, we use the higher library designed for PyTorch to implement this computation. |
| Experiment Setup | Yes | For all unlearning algorithms, we use SGD as the optimizer, with a momentum of 0.9 and a weight decay of 5e-4. The (un)learning rate is set to 0.02 for CIFAR-10 and 0.05 for Image Net. Each unlearning process is run with a batch size of 128 for a single epoch.3 Table 8: Hyperparameter settings for white-box and black-box attacks used in our experiments. |