reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

Authors: Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efficacy of our approach through various evaluation metrics and auxiliary tasks, outperforming in terms of specificity of attribution by over 60%.
Researcher Affiliation	Academia	Tong Xie 1 EMAIL Haoyu Li 1 EMAIL Andrew Bai2 EMAIL Cho-Jui Hsieh2 EMAIL 1Department of Mathematics, 2Department of Computer Science University of California, Los Angeles
Pseudocode	Yes	Algorithm 1 Training Timesteps and Norm Ranking
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	Setup. We compute test-influence on 4 diffusion models trained on 1). Tiny Image Net containing 100,000 images with 64 x 64 resolution (Le & Yang, 2015), 2). Celeb Faces Attributes (Celeb A) subset containing 100,000 images with 128 x 128 resolution (Liu et al., 2015), 3). Artbench-2 consisting of Post-impressionis and ukiyo-e subclasses (Liao et al., 2022; Zheng et al., 2023), each with 5,000 images and resolution 64 64, and 4). CIFAR-10 (Krizhevsky et al., 2009) consisting of 50,000 images with resolution 32 32.
Dataset Splits	No	The paper mentions combining specific subsets of CIFAR-10 and MNIST for training and using 'test samples' from these, but does not provide specific details on how these test samples were split from the original datasets or the proportions used for training/testing.
Hardware Specification	Yes	Additionally, training and experimentations are conducted on Nvidia A10G and RTX A6000 GPUs.
Software Dependencies	No	The paper mentions using Diffusion Denoising Implicit Model (DDIM) architecture and Stochastic gradient descent (SGD) and Adam optimizers, but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	The diffusion models utilized in the experiments are trained with Diffusion Denoising Implicit Model (DDIM) (Song et al., 2020) architecture, using 1,000 denoising timesteps and 50 inference steps. Stochastic gradient descent (SGD) and Adam optimizers are used and tested for the attribution results.