reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Targeted Unlearning with Single Layer Unlearning Gradient

Authors: Zikui Cai, Yaoteng Tan, M. Salman Asif

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first provide a brief overview of the experimental setup for each experiment. We then present key results demonstrating the effectiveness of SLUG in unlearning CLIP, Stable Diffusion, and vision-language models. Our experiments demonstrate SLUG effectiveness across all three key objectives. For efficiency, we achieve state-of-the-art results on the Unlearn Canvas benchmark while requiring only a fraction of the computational resources and tiny storage. For precision, we show minimal impact on related concepts and image quality. In terms of robustness, we evaluate against recent vulnerabilities identified by Zhang et al. (2025) and Petsiuk & Saenko (2025), demonstrate the effectiveness of our method.
Researcher Affiliation	Academia	1University of California Riverside, Riverside, CA, USA 2University of Maryland, College Park, MD, USA. Correspondence to: M. Salman Asif <EMAIL>.
Pseudocode	Yes	We have included a pseudocode for SLUG in Section B. In this section, we present the pseudocode for our method, SLUG, in Algorithm 1, the search process for Pareto-optimal layers in Algorithm 2, and the binary search for the optimal unlearning step size in Algorithm 3.
Open Source Code	Yes	Our code is available at https://github.com/CSIPlab/SLUG.
Open Datasets	Yes	We used publicly-available datasets to construct the forget, retain, and validation sets. For identity unlearning, we curated the forget set by filtering the LAION-400M dataset (Schuhmann et al., 2021)... To assess unlearning effectiveness, we used the Celeb A dataset (Liu et al., 2015)... Utility of post-unlearning models were evaluated with Image Net dataset. Unlearn Canvas (Zhang et al., 2024d) was used to test unlearning of artistic styles and objects in Stable Diffusion.
Dataset Splits	Yes	For validation at each search step, we use 5% of the test set for CLIP, 10 test-time generated images (not present in the forget training set) for SD, and a 10-image subset per identity for VLM unlearning. In Table 5, we provide the eval runtime and effectiveness of SLUG versus different validation set sizes, following the setup of Table 1 on CLIP unlearning. Note that our original choice of 5% validation size already provides a good test accuracy on Image Net, close to that of the original model (which achieves 60.12%).
Hardware Specification	Yes	Note that while the details of the evaluation of efficiency metrics are not well defined in the original Unlearn Canvas, in Table. 2 we are reporting the best performance of SLUG can achieve on our computing resource NVIDIA A100 40GB.
Software Dependencies	No	The paper does not explicitly provide specific software dependencies with version numbers. It mentions models like CLIP, Stable Diffusion, and VLMs, implying the use of deep learning frameworks, but no versions for Python, PyTorch, or other libraries are given.
Experiment Setup	Yes	Our SLUG framework requires no manual hyperparameter tuning. We use binary search to determine the step size λ for the one-step unlearning update (see Algorithm 3) that optimizes the trade-off between unlearning and retention metrics on a small validation subset. Across all experiments, we fix the number of binary search steps to S = 10.