reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Unlearning in Diffusion Models

Authors: Silas Alberti, Kenan Hasanaliyev, Manav Shah, Stefano Ermon

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When evaluated on Celeb A-HQ and MNIST, SISS achieved Pareto optimality along the quality and unlearning strength dimensions. On Stable Diffusion, SISS successfully mitigated memorization on nearly 90% of the prompts we tested. We evaluate our SISS method, its ablations, Erase Diff, Neg Grad, and naive deletion through unlearning experiments on Celeb A-HQ, MNIST T-Shirt, and Stable Diffusion.
Researcher Affiliation	Academia	Silas Alberti Kenan Hasanaliyev Manav Shah Stefano Ermon Stanford University EMAIL
Pseudocode	No	The paper defines loss functions and describes the method's mathematical properties, but it does not include a distinct pseudocode block or algorithm section.
Open Source Code	Yes	We release our code online.1 1https://github.com/claserken/SISS
Open Datasets	Yes	We demonstrate the effectiveness of SISS on Celeb A-HQ (Karras et al., 2018), MNIST with T-Shirt, and Stable Diffusion. The base model for MNIST with T-Shirt was trained on MNIST (Deng, 2012) augmented with a specific T-shirt from Fashion-MNIST (Xiao et al., 2017). Stable Diffusion v1.4 drawn from Webster (2023). LAION
Dataset Splits	No	The paper describes how datasets were augmented or synthetically generated for specific experiments (e.g., "sampling 128 images for each prompt and using a k-means classifier for labelling each image as memorized (A) or not (X \ A)"), and how fine-tuning was performed. However, it does not provide explicit training/test/validation splits for the unlearning experiments themselves.
Hardware Specification	Yes	a cluster of 8 NVIDIA H100 GPUs were used to execute large numbers of runs in parallel. In addition, an g5.xlarge instance with an NVIDIA A10G GPU on AWS, a personal home computer with an NVIDIA RTX 3090, and a cluster of 3 NVIDIA A4000 GPUs were the primary code development environments.
Software Dependencies	No	All diffusion models were trained and fine-tuned using the Hugging Face diffusers package along with the Adam optimizer (Kingma & Ba, 2015). The paper mentions software packages like 'Hugging Face diffusers' and 'Adam optimizer' but does not specify their version numbers.
Experiment Setup	Yes	Our pretrain and retrain unconditional MNIST T-Shirt DDPMs were trained for 250 epochs with a batch size of 128 images and a learning rate of 1e 4 with cosine decay. Both models used the same DDPM sampler at inference with 50 backwards steps. In the case of Celeb A-HQ and Stable Diffusion, we did not perform the pretraining and chose a batch size of 64 and 16 images with a learning rate of 5e 6 and 1e 5, respectively.