reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Machine Unlearning Fails to Remove Data Poisoning Attacks

Authors: Martin Pawelczyk, Jimmy Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget.
Researcher Affiliation	Collaboration	1Harvard University, 2University of Waterloo, 3Vector Institute, 4MIT, 5Google
Pseudocode	Yes	Algorithm 1 Gaussian Unlearning Score (GUS) Input: Model θ to be evaluated. Algorithm 2 Gaussian Data Poisoning to Evaluate Unlearning Input: Unlearning algorithm Unlearn-Alg to be evaluated. Algorithm 3 Gradient Matching to generate poisons (Geiping et al., 2021) Algorithm 4 Gradient Canceling (GC) Attack (Lu et al., 2023)
Open Source Code	Yes	We release the code for our Gaussian data poisoning method at: https://github.com/Martin Pawel/ Open Unlearn.
Open Datasets	Yes	For the language task, we consider the IMDb dataset (Maas et al., 2011). ... For the vision task, we use the CIFAR-10 dataset (Krizhevsky et al., 2010).
Dataset Splits	No	The paper discusses using
Hardware Specification	No	The paper mentions 'compute budget' and 'computational constraints' but does not specify any particular hardware models like GPUs or CPUs used for the experiments.
Software Dependencies	No	The paper mentions models like Resnet-18 and GPT-2, and optimizers like SGD and Adam, but does not specify version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Models. For the vision tasks, we train a standard Resnet-18 model for 100 epochs. We conduct the language experiments on GPT-2 (355M parameters) LLMs (Radford et al., 2019). ... We train these models for 10 epochs on the poisoned IMDb training dataset. ... GD using the following hyperparameters: SGD optimizer with a lr = 1e 3, momentum = 0.9, and weight_decay = 5e 4. ... NGD using the same hyperparameters as GD with the additional Gaussian noise variance σ2 {1e 07,1e 06}. ... GA using the similar hyperparameters as GD but with a smaller lr = [5e 6,1e 5]. ... EUk ... with a learning rate of 1e-3, 1e-4, 1e-5 and the number of layers to retrain K = 3. ... CFk, we experiment with a learning rate of {1e 3,1e 4,1e 5} and the number of layers to retrain set to K = 3. ... Compute budget. ... up to 10% of the compute used in initial training (or fine-tuning) of the model.