reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Authors: Kulin Shah, Alkis Kalavasis, Adam Klivans, Giannis Daras

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate our approach on various datasets and data settings, showcasing significantly reduced memorization and improved generation quality compared to natural baselines, in both the unconditional and text-conditional settings. We start our experimental evaluation by measuring the memorization and performance of unconditional diffusion models in several controlled settings. Specifically, we train models from scratch on CIFAR-10, FFHQ, and (tiny) Image Net using 300, 1000 and 3000 training samples. For each one of these settings, we compute the Fréchet Inception Distance (FID) between 50,000 generated samples and 50,000 dataset samples as a measure of quality.
Researcher Affiliation	Academia	1University of Texas at Austin 2Yale University 3Massachusetts Institute of Technology.
Pseudocode	Yes	A. Our Algorithm...Algorithm 1 Algorithm for training diffusion models using limited data.
Open Source Code	No	The paper does not provide an explicit statement about open-source code availability or a link to a code repository.
Open Datasets	Yes	Specifically, we train models from scratch on CIFAR-10, FFHQ, and (tiny) Image Net using 300, 1000 and 3000 training samples. Following prior work (Somepalli et al., 2023), we finetune Stable Diffusion on 10k image-text pairs from a curated subset of LAION (Schuhmann et al., 2022). We use Tiny Imagenet dataset which consists of 200 classes (Le & Yang, 2015).
Dataset Splits	Yes	Specifically, we train models from scratch on CIFAR-10, FFHQ, and (tiny) Image Net using 300, 1000 and 3000 training samples. For each one of these settings, we compute the Fréchet Inception Distance (FID) between 50,000 generated samples and 50,000 dataset samples as a measure of quality. For Tiny Imagenet, we sample 5 images randomly from each class to create a dataset consisting of 1000. Similarly, we sample 15 images from each class to create a dataset consisting of 3000 images.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions using the Adam optimizer and implementations from prior work (Karras et al., 2022; Somepalli et al., 2023; Wen et al., 2024) but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	For all of our experiments regarding unconditional generation, we use the Adam optimizer with a learning rate of 0.0001, betas (0.9, 0.999), an epsilon value of 1e-8, and a weight decay of 0.01. The model for FFHQ and CIFAR-10 is trained for 30,000 iterations with a batch size of 256 and the model for Imagenet is trained for 512 batch size for 80,000 iterations.