reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion

Authors: Joshua Kazdan, Hao Sun, Jiaqi Han, Felix Petersen, Frederick Vu, Stefano Ermon

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run three distinct sets of experiments to demonstrate the ways in which CPSample protects the privacy of the training data. First, we statistically test the ability of CPSample to reduce similarity between generated data and the training set for unguided diffusion. We then demonstrate that CPSample can prevent Stable Diffusion from generating memorized images. Finally, we measure robustness against membership inference attacks. Hyperparameters, in all empirical tests, are chosen to maximize image quality while eliminating exact matches. In our tests of image quality, we find that CPSample far outperforms existing methods of protecting the training data.
Researcher Affiliation	Academia	Joshua Kazdan1, Hao Sun2, Jiaqi Han2, Felix Petersen2, Frederick Vu3, Stefano Ermon2 1Department of Statistics, Stanford University 2Department of Computer Science, Stanford University 3Department of Mathematics, UCLA
Pseudocode	Yes	Algorithm 1 Test statistic for membership inference attack against diffusion models (Matsumoto et al., 2023a) Input: Target samples x1, ..., xm, CPSample denoiser ˆϵθ,ϕ, noise schedule αt = Qt s=1(1 βs) total_error 0 for x in {x1, ..., xm} do total_error total_error + ϵ ˆϵθ,ϕ( αtx + 1 αtϵ, t) 2 end for mean_error total_error/m.
Open Source Code	Yes	REPRODUCIBILITY STATEMENT We provide source code, scripts, and configuration details for experiments in the supplementary material for those seeking to reproduce this study. Proofs of original claims are given in the appendix, along with details for the implementation and training of the model. Statistical measures of significance are included to ensure the robustness of our results.
Open Datasets	Yes	achieving state-of-the-art FID scores on CIFAR10 (Krizhevsky, 2009), Celeb A (Liu et al., 2015), Image Net (Deng et al., 2009), and other touchstone datasets.
Dataset Splits	Yes	To evaluate resistance to inference attacks, we use a model trained on the entire set of 50 000 CIFAR-10 training images. We compare the reconstruction loss on these 50 000 training images to the reconstruction loss on the 10 000 withheld test samples included in the CIFAR-10 dataset.
Hardware Specification	Yes	The training of each classifier model was conducted using 4 NVIDIA A4000 GPUs with 16GB of memory. ... We employed 2 NVIDIA A5000 GPUs with 24GB of memory for fine-tuning each model on the subsets.
Software Dependencies	No	The paper mentions several models/frameworks like U-Net, Adam optimizer, DINO (Caron et al., 2021), FAISS model (Douze et al., 2024), and EDM (Karras et al., 2022). However, it does not provide specific version numbers for general software dependencies such as Python, PyTorch, or CUDA, which are crucial for full reproducibility.
Experiment Setup	Yes	Table 5: Training Parameters & Steps Batch Size LR Optimizer EMA Rate Classifier Steps Fine-tune Steps CIFAR-10 256 2e-4 Adam 0.9999 560 000 110 000 Celeb A 128 2e-4 Adam 0.9999 610 000 150 000 LSUN Church 8 2e-5 Adam 0.999 1 250 000 880 000