reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Certified Unlearning Approach without Access to Source Data

Authors: Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, University of California, Riverside, CA, USA 2Brookhaven National Laboratory, Upton, NY, USA.
Pseudocode	Yes	Algorithm 1 Unlearning Mechanism Leveraging Surrogate Data Statistics
Open Source Code	Yes	Our main implementation used for this paper is available at https://github.com/info-ucr/ certified-unlearning-surr-data. We also implemented the mixed-linear networks (Golatkar et al., 2021) from scratch, the code is available at https://github. com/info-ucr/mixed-privacy-forgetting.
Open Datasets	Yes	We further evaluate our method on CIFAR10 (Krizhevsky et al., 2009), Caltech256 (Griffin et al., 2007), and Stanford Dogs (Khosla et al., 2011)... MNIST (Lecun et al., 1998) and USPS (Hull, 1994) datasets
Dataset Splits	Yes	Unless otherwise noted, we adopt a linear training model with forget ratio of 0.1... We evaluate the performance using train, test, retain, and forget accuracies on their respective data splits... In both cases, our method achieves effective certified unlearning and maintains competitive accuracy on the retained data, demonstrating that mixed linear networks provide a practical and theoretically sound foundation for unlearning in neural models. MIA scores are omitted for class unlearning because the attack is designed to distinguish between test and forget samples; forgetting an entire class greatly increases distinguishability, making the MIA score uninformative.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	Unless stated otherwise, we use a linear training model with privacy parameters ϵ = 5e3 and δ = 1, a forget ratio of 0.1, and an L2 regularization constant of λ = 0.01... We set α = 1+λ, L = 1, β = 1 and γ = 1. For the sampling from the marginal distribution of the exact data, we used Stochastic Gradient Langevin Dynamics (SGLD) with step size 0.02 and generate 1000 samples. For each sample random update is applied 4000 iteration for each generated sample. After sampling done to estimate the KL divergence via Donsker Varadhan variational bound, we trained a a network with three linear layers for a 500 epochs with learning rate 0.0001 using Adam optimizer.