reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Authors: Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish, Kumar Dubey, Snigdha Chaturvedi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we outline the experimental setup and evaluate the unlearning performance of S3T across various setups. The goal of the experiments is to evaluate the functioning of individual components and the unlearning capabilities of S3T. Specifically, we design experiments to answer the following research questions: (RQ1) Does S3T training (Section 3.2) impact the model’s performance compared to full training? (RQ2) Does S3T enhance the deletion capabilities of unlearning, and what is its cost tradeoff? (RQ3) Is the sequence permutation selection algorithm (Section 3.3) effective in practice? ... In Figure 5, we report the performance of fine-tuning on vision, GLUE, and Super GLUE benchmarks.
Researcher Affiliation	Collaboration	1UNC Chapel Hill, 2Google DeepMind, 3Columbia University, 4Independent, 5Google Research. EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Iterative Cyclic Rotation Algorithm 2 BMS Sequence Selection Algorithm Algorithm 3 S3T Training Procedure Algorithm 4 S3T Deletion Procedure
Open Source Code	Yes	1https://github.com/brcsomnath/S3T
Open Datasets	Yes	We use ViTBASE (Dosovitskiy et al., 2020) (for CIFAR10 & CIFAR100 (Krizhevsky et al., 2009)), ViTLARGE (for Tiny ImageNet (Le & Yang, 2015)), and RoBERTa LARGE (Liu et al., 2019) (for GLUE (Wang et al., 2018) & Super GLUE (Wang et al., 2019))... instruction tuning of Llama2-7B (Touvron et al., 2023), Llama2-13B, and Llama3-8B using Alpaca dataset (Taori et al., 2023).
Dataset Splits	Yes	We use ViTBASE (Dosovitskiy et al., 2020) (for CIFAR10 & CIFAR100 (Krizhevsky et al., 2009)), ViTLARGE (for Tiny ImageNet (Le & Yang, 2015)), and RoBERTa LARGE (Liu et al., 2019) (for GLUE (Wang et al., 2018) & Super GLUE (Wang et al., 2019))... We report the zero-shot performance for all datasets.
Hardware Specification	Yes	Our experiments were run on NVIDIA A6000 GPUs.
Software Dependencies	No	We perform all experiments using PyTorch (Paszke et al., 2019) and Huggingface (Wolf et al., 2019) framework. Our experiments were run on NVIDIA A6000 GPUs.
Experiment Setup	Yes	In Table 2, we report the common set of hyperparameters for S3T fine-tuning experiments. All hyperparameters were set using a grid search with the Weights & Biases framework. We use an AdamW optimizer with the corresponding learning rates for each dataset (reported in Table 2). During fine-tuning of the models, we perform full-precision training for all settings except instruction tuning where we use 8-bit training.