reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Knowledge Deletion from Trained Models Through Layer-wise Partial Machine Unlearning

Authors: Vinay Chakravarthi Gogineni, Esmaeil S. Nadimi

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a detailed experimental evaluation, we showcase the eﬀectiveness of proposed unlearning methods. Experimental results highlight that the partial amnesiac unlearning not only preserves model eﬃcacy but also eliminates the necessity for brief ﬁne-tuning post unlearning, unlike conventional amnesiac unlearning. Further, employing layer-wise partial updates in label-ﬂipping and optimization-based unlearning techniques demonstrates superiority in preserving model eﬃcacy compared to their naive counterparts. We conduct a comprehensive empirical assessment to demonstrate the eﬀectiveness of the proposed unlearning methods, wherein the membership inference metric serves as a key performance indicator. Our analysis encompasses diverse data sets such as MNIST (Le Cun et al., 2010), and street view house numbers (SVHN) (Netzer et al., 2011), CIFAR10 (Krizhevsky and Hinton, 2009) and medical MNIST (MEDMNIST) (Yang et al., 2022).
Researcher Affiliation	Academia	Vinay Chakravarthi Gogineni EMAIL Applied AI and Data Science Unit The Mærsk Mc-Kinney Møller Institute University of Southern Denmark Camusvej 55, Odense M, 5230 Esmaeil S. Nadimi EMAIL Applied AI and Data Science Unit The Mærsk Mc-Kinney Møller Institute University of Southern Denmark Camusvej 55, Odense M, 5230
Pseudocode	Yes	Algorithm 1 Partial Amnesiac Unlearning Algorithm 2 Layer-wise Partial Updates Induced Label-Flipping-based Unlearning Algorithm 3 Layer-wise Partial Updates Induced Optimization-based Unlearning
Open Source Code	No	The paper does not provide a direct link to a source-code repository or an explicit statement about releasing the code for the methodology described in this paper.
Open Datasets	Yes	1. MNIST Handwritten Digits data set (Le Cun et al., 2010): is a widely recognized benchmark datset 2. Street View House Numbers (SVHN) data set (Netzer et al., 2011) 3. CIFAR10 Image data set (Krizhevsky and Hinton, 2009) 4. MEDMNIST-Organ AMNIST Medical Image data set (Yang et al., 2022): is a publicly available benchmark data set for medical image analysis.
Dataset Splits	Yes	1. MNIST Handwritten Digits data set (Le Cun et al., 2010): ...comprises a total of 70, 000 grayscale images, with 60, 000 images designated for training and 10, 000 images for testing. 2. Street View House Numbers (SVHN) data set (Netzer et al., 2011): ...comprises 99, 289 color images in the training set, 73, 257 color images in the test set. 3. CIFAR10 Image data set (Krizhevsky and Hinton, 2009): ...consisting of a total of 60, 000 color images, with 50, 000 images allocated for training and 10, 000 images for testing. 4. MEDMNIST-Organ AMNIST Medical Image data set (Yang et al., 2022): ...the data set is partitioned into 34, 581 training images, 6, 491 validation images, and 17, 778 testing images.
Hardware Specification	Yes	All experiments were performed in Python 3.11 and use the Py Torch deep learning library Paszke et al. (2019). The system was equipped with an NVIDIA Tesla V100 GPU with 48GB of memory.
Software Dependencies	Yes	All experiments were performed in Python 3.11 and use the Py Torch deep learning library Paszke et al. (2019).
Experiment Setup	Yes	Initially, various models were trained for 8 epochs with a batch size of 128 across different data sets: a 2-layer MLP, Le Net, and Res Net9 for MNIST; VGG11, Res Net18, and Simple Vi T for CIFAR-10; VGG19, Res Net18, and Simple Vi T for SVHN; and Alex Net, Res Net50, and Vi T-Large for Organ AMNIST. The learning rate was set to 0.001 for shallow networks, while deeper networks were trained with a lower learning rate of either 0.0005 or 0.0001, depending on their depth and complexity.