reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Influence Functions for Scalable Data Attribution in Diffusion Models

Authors: Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, Richard E Turner

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically ablate variations of the GGN approximation and other design choices in our framework and show that our proposed method outperforms the existing data attribution methods for diffusion models as measured by common data attribution metrics like the Linear Datamodeling Score (Park et al., 2023) or retraining without top influences.
Researcher Affiliation	Academia	1Department of Engineering, University of Cambridge, UK 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3Department of Computer Science, University of Toronto, Canada 4Vector Institute, Toronto, Canada 5Department of Computer Science, ETH Zurich, Switzerland 6Mila Quebec AI Institute, Montreal, Canada 7The Alan Turing Institute, London, UK
Pseudocode	Yes	Algorithm 1 K-FAC Influence Computation (Single-Use) ... Algorithm 2 K-FAC Influence Computation (Continual Deployment Setting)
Open Source Code	Yes	Source code available at https://github.com/Bruno KM/diffusion-influence
Open Datasets	Yes	We primarily focus on Denoising Diffusion Probabilistic Models (DDPM) (Ho et al., 2020) throughout. Runtimes are reported in Appendix E. LDS. The LDS results attributing the loss and ELBO measurements are shown in Figures 2a and 2b. The LDS results attributing the marginal log-probability on dequantised data are shown in Appendix I. K-FAC Influence outperforms TRAK in all settings. J.1 DATASETS We focus on the following datasets in this paper: CIFAR-10 CIFAR-10 is a dataset of small RGB images of size 32 × 32 Krizhevsky (2009). We use 50000 images (the train split) for training. CIFAR-2 For CIFAR-2, we follow Zheng et al. (2024) and create a subset of CIFAR-10 with 5000 examples of images only corresponding to classes car and horse. 2500 examples of class car and 2500 examples of class horse are randomly subsampled without replacement from among all CIFAR-10 images of that class. Art Bench-10 The Art Bench-10 dataset (Liao et al., 2022) is a dataset of 60000 artworks from 10 artistic styles. The RGB images of the artworks are standardised to a 256 × 256 resolution. We use the full original train-split (50000 examples) from the original paper (Liao et al., 2022) for our experiments.
Dataset Splits	Yes	For all CIFAR LDS benchmarks Park et al. (2023), we sample 100 sub-sampled datasets (M := 100 in Equation (13)), and we train 5 models with different random seeds (K := 5 in Equation (13)), each with 50% of the examples in the full dataset, for a total of 500 retrained models for each benchmark. J.1 DATASETS CIFAR-10 CIFAR-10 is a dataset of small RGB images of size 32 × 32 Krizhevsky (2009). We use 50000 images (the train split) for training. CIFAR-2 For CIFAR-2, we follow Zheng et al. (2024) and create a subset of CIFAR-10 with 5000 examples of images only corresponding to classes car and horse. 2500 examples of class car and 2500 examples of class horse are randomly subsampled without replacement from among all CIFAR-10 images of that class.
Hardware Specification	Yes	All experiments were ran on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions 'curvlinops (Dangel et al., 2025) package', 'PyTorch nn.Conv2d and nn.Linear modules', 'Adam W optimiser', 'cosine learning-rate schedule', 'exponential moving average (EMA)', and 'https://github.com/huggingface/diffusers library'. However, no specific version numbers for any of these software components are provided.
Experiment Setup	Yes	For CIFAR-10, we train for 160000 steps (compared to 800000 in Ho et al. (2020)) for the full model, and 80000 steps for the subsampled datasets (≈410 epochs in each case). On CIFAR-2, we train for 32000 steps for the model trained on the full dataset, and 16000 steps for the subsampled datasets (≈800 epochs in each case). We also use a cosine learning-rate schedule for the CIFAR-2 models. For Art Bench-10, ... We follow the training procedure in (Rombach et al., 2022) and train the full model for 200000 training iterations, and the models trained on the subsampled data for 60000 iterations. We use linear warm-up for the learning rate schedule for the first 5% of the training steps. We use the Adam W optimiser with a learning rate of 10−4, weight-decay of 10−6, gradient norm clipping of 1, and exponential moving average (EMA) with maximum decay rate of 0.9999 and EMA warm-up exponential factor of 0.75 (see the https://github.com/huggingface/diffusers library for details on the EMA parameters).