An Information Theoretic Approach to Machine Unlearning

Authors: Jack Foster, Kyle Fogarty, Stefan Schoepf, Zack Dugue, Cengiz Oztireli, Alexandra Brintrup

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method is competitive with state-of-the-art performance under the strict constraints of zero-shot unlearning. Our primary contributions are as follows: ... We provide extensive empirical analysis of the geometry of Ji T unlearning in low dimensions. We show our method is competitive with existing SOTA in the zero-shot domain.
Researcher Affiliation Academia Jack Foster EMAIL Department of Engineering University of Cambridge, UK; Kyle Fogarty EMAIL Department of Computer Science and Technology University of Cambridge, UK; Stefan Schoepf EMAIL Department of Engineering University of Cambridge, UK; Zack Dugue EMAIL Department of Computer Science California Institute of Technology, United States; Cengiz Öztireli EMAIL Department of Computer Science and Technology University of Cambridge, UK; Alexandra Brintrup EMAIL Department of Engineering University of Cambridge, UK
Pseudocode Yes 10 Appendix 10.1 Method Algorithm Algorithm 1 JIT UNLEARNING INPUT: The trained model f ( ) and the forget set S. PARAMETER: , σ, N OUTPUT: fˆ ( ) = US (f ( )) 1: Initialise optim( , lr = ) 2: for x in S do 3: = 0 4: for i in range(N) do 5: x0 = x + for s N(0, σ2) 6: k = kf (x) f (x0)k2 k k2 7: = + k 8: end for 9: end for 10: = /N 11: ˆ optim{r } 12: return fˆ ( )
Open Source Code No The paper does not provide an explicit statement about releasing code, a direct link to a code repository, or mention code in supplementary materials.
Open Datasets Yes Datasets: As with previous work, we benchmark Ji T on a range of image classification benchmarks. We make use of the CIFAR suite (Krizhevsky and Hinton, 2010), and the Pins Facial Recognition dataset (Burak, 2020), which consists of 17, 534 images of 105 celebrity faces.
Dataset Splits Yes We implement the same benchmarks from Foster et al. (2023), which are similar to that of Chundawat et al. (2023a), Golatkar et al. (2020) and Kurmanji et al. (2023). ... Unlearning scenarios: Typically the three unlearning scenarios are: i) Full-class forgetting, where a full class from the dataset must be unlearned, ii) Sub-class forgetting, where a related subset from a class (e.g. all rockets from class vehicle) is forgotten, and iii) Random forgetting, where a subset is sampled uniformly from the entire training distribution.
Hardware Specification Yes Models: We evaluate methods on Vision Transformer (Vi T) (Dosovitskiy et al., 2021) and VGG11 (Simonyan and Zisserman, 2014), trained on an NVidia RTX 4090 using Stochastic Gradient Descent with an initial learning rate of 0.1, and the One Cycle learning rate scheduler (Smith and Topin, 2019).
Software Dependencies No The paper mentions software tools like Optuna (Akiba et al., 2019), but does not provide specific version numbers for key software components such as Python, PyTorch, or CUDA.
Experiment Setup Yes Models: We evaluate methods on Vision Transformer (Vi T) (Dosovitskiy et al., 2021) and VGG11 (Simonyan and Zisserman, 2014), trained on an NVidia RTX 4090 using Stochastic Gradient Descent with an initial learning rate of 0.1, and the One Cycle learning rate scheduler (Smith and Topin, 2019). ... Ji T hyper-parameters: We conduct a hyper-parameter search for η and σ using 250 runs of the TPE search from Optuna (Akiba et al., 2019), for each unlearning scenario. For VGG11, we use the following parameters: full-class unlearning uses η = 0.0003, σ = 0.5, sub-class and random both use η = 0.0003, σ = 0.01. For Vi T, the selected parameters are: full-class η = 1.5, σ = 0.8, sub-class η = 0.5, σ = 1.5, and random η = 0.01, σ = 0.5. Vi T and VGG use considerably different learning rates, since only a single epoch is used during the unlearning step.