Machine Unlearning Fails to Remove Data Poisoning Attacks

Authors: Martin Pawelczyk, Jimmy Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget.
Researcher Affiliation Collaboration 1Harvard University, 2University of Waterloo, 3Vector Institute, 4MIT, 5Google
Pseudocode Yes Algorithm 1 Gaussian Unlearning Score (GUS) Input: Model θ to be evaluated. Algorithm 2 Gaussian Data Poisoning to Evaluate Unlearning Input: Unlearning algorithm Unlearn-Alg to be evaluated. Algorithm 3 Gradient Matching to generate poisons (Geiping et al., 2021) Algorithm 4 Gradient Canceling (GC) Attack (Lu et al., 2023)
Open Source Code Yes We release the code for our Gaussian data poisoning method at: https://github.com/Martin Pawel/ Open Unlearn.
Open Datasets Yes For the language task, we consider the IMDb dataset (Maas et al., 2011). ... For the vision task, we use the CIFAR-10 dataset (Krizhevsky et al., 2010).
Dataset Splits No The paper discusses using
Hardware Specification No The paper mentions 'compute budget' and 'computational constraints' but does not specify any particular hardware models like GPUs or CPUs used for the experiments.
Software Dependencies No The paper mentions models like Resnet-18 and GPT-2, and optimizers like SGD and Adam, but does not specify version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes Models. For the vision tasks, we train a standard Resnet-18 model for 100 epochs. We conduct the language experiments on GPT-2 (355M parameters) LLMs (Radford et al., 2019). ... We train these models for 10 epochs on the poisoned IMDb training dataset. ... GD using the following hyperparameters: SGD optimizer with a lr = 1e 3, momentum = 0.9, and weight_decay = 5e 4. ... NGD using the same hyperparameters as GD with the additional Gaussian noise variance σ2 {1e 07,1e 06}. ... GA using the similar hyperparameters as GD but with a smaller lr = [5e 6,1e 5]. ... EUk ... with a learning rate of 1e-3, 1e-4, 1e-5 and the number of layers to retrain K = 3. ... CFk, we experiment with a learning rate of {1e 3,1e 4,1e 5} and the number of layers to retrain set to K = 3. ... Compute budget. ... up to 10% of the compute used in initial training (or fine-tuning) of the model.