Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Authors: Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach through various evaluation metrics and auxiliary tasks, outperforming in terms of specificity of attribution by over 60%.
Researcher Affiliation Academia Tong Xie 1 EMAIL Haoyu Li 1 EMAIL Andrew Bai2 EMAIL Cho-Jui Hsieh2 EMAIL 1Department of Mathematics, 2Department of Computer Science University of California, Los Angeles
Pseudocode Yes Algorithm 1 Training Timesteps and Norm Ranking
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets Yes Setup. We compute test-influence on 4 diffusion models trained on 1). Tiny Image Net containing 100,000 images with 64 x 64 resolution (Le & Yang, 2015), 2). Celeb Faces Attributes (Celeb A) subset containing 100,000 images with 128 x 128 resolution (Liu et al., 2015), 3). Artbench-2 consisting of Post-impressionis and ukiyo-e subclasses (Liao et al., 2022; Zheng et al., 2023), each with 5,000 images and resolution 64 64, and 4). CIFAR-10 (Krizhevsky et al., 2009) consisting of 50,000 images with resolution 32 32.
Dataset Splits No The paper mentions combining specific subsets of CIFAR-10 and MNIST for training and using 'test samples' from these, but does not provide specific details on how these test samples were split from the original datasets or the proportions used for training/testing.
Hardware Specification Yes Additionally, training and experimentations are conducted on Nvidia A10G and RTX A6000 GPUs.
Software Dependencies No The paper mentions using Diffusion Denoising Implicit Model (DDIM) architecture and Stochastic gradient descent (SGD) and Adam optimizers, but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes The diffusion models utilized in the experiments are trained with Diffusion Denoising Implicit Model (DDIM) (Song et al., 2020) architecture, using 1,000 denoising timesteps and 50 inference steps. Stochastic gradient descent (SGD) and Adam optimizers are used and tested for the attribution results.