reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Corrective Machine Unlearning

Authors: Shashwat Goel, Ameya Prabhu, Philip Torr, Ponnurangam Kumaraguru, Amartya Sanyal

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we run experiments with two types of manipulations that can occur in real-world data collection pipelines, and test whether existing methods can unlearn their adverse eﬀects. We ﬁrst describe these manipulations, and then our dataset, model architecture, and unlearning setup.
Researcher Affiliation	Academia	Shashwat Goel EMAIL IIIT Hyderabad ELLIS Institute Tübingen Max Planck Institute for Intelligent Systems Ameya Prabhu University of Oxford Tübingen AI Center Philip Torr University of Oxford Ponnurangam Kumaraguru IIIT Hyderabad Amartya Sanyal University of Copenhagen EMAIL
Pseudocode	No	The paper describes the unlearning methods in prose in Section 3.2 and Appendix A.2, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	We ﬁrst use the CIFAR10 and CIFAR100 (Krizhevsky et al., 2009) datasets as standard benchmarks in the unlearning literature (Foster et al., 2023; Kurmanji et al., 2023; Chundawat et al., 2023a). We then report poison unlearning results on PCam (Veeling et al., 2018), a binary classiﬁcation medical imaging dataset, as a potential application.
Dataset Splits	Yes	For all results, metrics are computed on the test set containing unseen samples. The mean and standard deviation are reported over 3 seeds. In the interclass confusion evaluation, for CIFAR10, we confuse the Cat and Dog classes, and for CIFAR100, the maple and oak tree classes, to be consistent with the setup of Goel et al. (2023).
Hardware Specification	Yes	The setup used for all experiments is a PC with a Intel(R) Xeon(R) E5-2640 2.40 GHz CPU, 128GB RAM and 1 Ge Force RTX 2080 GPU.
Software Dependencies	No	The paper mentions 'Py Torch' in reference to a ResNet-9 implementation (Idelbayev, 2018) but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Our standard training procedure A is as follows: We train our models for 4000 steps on CIFAR10, PCAM and 6000 steps on CIFAR100. Each step consists of training on a single batch, and we use a batch size of 512 throughout. We use an SGD optimizer with momentum 0.9 and weight decay 5e-4, a linear scheduler with tmult = 1.25, and warmup steps as 1 100 of the total training steps. The same hyperparameters are used during unlearning unless otherwise speciﬁed.