Machine Unlearning Fails to Remove Data Poisoning Attacks
Authors: Martin Pawelczyk, Jimmy Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. |
| Researcher Affiliation | Collaboration | 1Harvard University, 2University of Waterloo, 3Vector Institute, 4MIT, 5Google |
| Pseudocode | Yes | Algorithm 1 Gaussian Unlearning Score (GUS) Input: Model θ to be evaluated. Algorithm 2 Gaussian Data Poisoning to Evaluate Unlearning Input: Unlearning algorithm Unlearn-Alg to be evaluated. Algorithm 3 Gradient Matching to generate poisons (Geiping et al., 2021) Algorithm 4 Gradient Canceling (GC) Attack (Lu et al., 2023) |
| Open Source Code | Yes | We release the code for our Gaussian data poisoning method at: https://github.com/Martin Pawel/ Open Unlearn. |
| Open Datasets | Yes | For the language task, we consider the IMDb dataset (Maas et al., 2011). ... For the vision task, we use the CIFAR-10 dataset (Krizhevsky et al., 2010). |
| Dataset Splits | No | The paper discusses using |
| Hardware Specification | No | The paper mentions 'compute budget' and 'computational constraints' but does not specify any particular hardware models like GPUs or CPUs used for the experiments. |
| Software Dependencies | No | The paper mentions models like Resnet-18 and GPT-2, and optimizers like SGD and Adam, but does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Models. For the vision tasks, we train a standard Resnet-18 model for 100 epochs. We conduct the language experiments on GPT-2 (355M parameters) LLMs (Radford et al., 2019). ... We train these models for 10 epochs on the poisoned IMDb training dataset. ... GD using the following hyperparameters: SGD optimizer with a lr = 1e 3, momentum = 0.9, and weight_decay = 5e 4. ... NGD using the same hyperparameters as GD with the additional Gaussian noise variance σ2 {1e 07,1e 06}. ... GA using the similar hyperparameters as GD but with a smaller lr = [5e 6,1e 5]. ... EUk ... with a learning rate of 1e-3, 1e-4, 1e-5 and the number of layers to retrain K = 3. ... CFk, we experiment with a learning rate of {1e 3,1e 4,1e 5} and the number of layers to retrain set to K = 3. ... Compute budget. ... up to 10% of the compute used in initial training (or fine-tuning) of the model. |