reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Low Compute Unlearning via Sparse Representations

Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Curtis Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed technique on the problem of class unlearning using four datasets: CIFAR-10, CIFAR-100, LACUNA100 and Image Net-1k. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all four datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.
Researcher Affiliation	Academia	Vedant Shah EMAIL Mila, Université de Montréal Frederik Träuble MPI, Tübingen Ashish Malik University of Oregon Hugo Larochelle Mila, Université de Montréal Michael Mozer University of Colorado, Boulder Sanjeev Arora Princeton University Yoshua Bengio Mila, Université de Montréal Anirudh Goyal EMAIL Mila
Pseudocode	Yes	Algorithm 1: Unlearning via Activations Algorithm 2: Unlearning via Examples
Open Source Code	No	The paper mentions a third-party library: "We use the fvcore1 library for computing the number FLOPs required during the forward passes.1https://github.com/facebookresearch/fvcore/". However, it does not provide any statement or link to the source code for the methodology described in this paper.
Open Datasets	Yes	We validate the proposed methods using experiments across four base datasets: CIFAR-10 with 10 distinct classes, CIFAR-100 (Krizhevsky et al., 2009) with 100 distinct classes, LACUNA100 (Golatkar et al., 2020a) with 100 distinct classes and Image Net-1k (Russakovsky et al., 2015) with 1000 distinct classes. LACUNA-100 is derived from VGG-Faces (Cao et al., 2018) by sampling 100 different celebrities and sampling 500 images per celebrity, out of which 400 are used as training data and the rest are used as test images.
Dataset Splits	Yes	Let Dtrain = {xi, yi}N i=1 be a training dataset and Dtest be the corresponding test dataset. In our experiments, we consider the setting of class unlearning, wherein we aim to unlearn a class c from a model trained with a multiclass classification objective on Dtrain. c is called the forget class or the forget set. Given c, we obtain Dforget train Dtrain such that Dforget train = {(x, y) Dtrain\|y = c}. The complement of Dforget train is Dretain train , i.e., subset of Dtrain that we wish to retain. Thus Dretain train Dforget train = Dtrain. Similarly, from Dtest, we have Dforget test = {(x, y) Dtest\|y = c} and its complement Dretain test . We refer to Dretain train and Dretain test as the retain set training and test data; and Dforget train and Dforget test as the forget set training and test data, respectively. Table 3: Performance of the models on different sets of data after the initial training on the four datasets. We use two kinds of models: (a) models having a Discrete KV Bottleneck which are used for the proposed methods and (b) models where the DKVB and the decoder are replaced by a Linear Layer. These are used for the baseline. We wish to reduce the accuracy of these models on Dforget test to 0% while maintaining the accuracy on Dretain test . Experimental Setup We perform the experiment for CIFAR-10 with a Vi T/B-32 backbone. We divide the dataset into training data (DT rain), validation data (DV al) and test data (DT est). Training Data consists of 4000 examples per class; validation and test data consist of 1000 examples per class.
Hardware Specification	Yes	We perform all of our experiments on a 48GB RTX8000 GPU.
Software Dependencies	No	The paper mentions using "CLIP (Radford et al., 2021) pretrained Vi T-B/32" and loading "torchvision.models.Res Net50_Weights". It also references the "fvcore1 library". However, specific version numbers for software components like PyTorch, Python, or the aforementioned libraries are not provided.
Experiment Setup	Yes	We then train both model architectures on the full training sets of each dataset. Since the backbone is frozen, for the baseline models, only the weights of the linear layer are tuned during initial training (and later unlearning). Since we use only one linear layer, we do not do any pre-training (beyond the backbone), unlike in previous works (Kurmanji et al., 2023; Golatkar et al., 2020a;b). Table 3 shows the performance of these trained models on the train and test splits of the complete datasets. Tables 11, 12, 13, 14 provide hyperparameter details: Table 11: Hyperparameters used for training the base DKVB models; Table 12: Hyperparameters used for training the baseline models; Table 13: Hyperparameters for SCRUB + Linear Layer Experiments shown in Section 5.2.1; Table 14: Hyperparameters used for re-training experiments.