reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards a Mechanistic Explanation of Diffusion Model Generalization

Authors: Matthew Niedoba, Berend Zwartsenberg, Kevin Patrick Murphy, Frank Wood

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operations, as these operations approximate the training objective well over much of the training distribution. To validate our hypothesis, we introduce novel denoising algorithms which aggregate local empirical denoisers to replicate network behaviour. Comparing these algorithms to network denoisers across forward and reverse diffusion processes, our approach exhibits consistent visual similarity to neural network outputs, with lower mean squared error than previously proposed methods. Figure 1 shows "Denoiser outputs given shared reverse process noisy inputs from CIFAR-10". Figure 2 plots the mean squared error (MSE) between network and optimal denoisers. Figure 7 presents "Comparison of various denoisers against DDPM++ over forward and reverse processes". Figure 9 shows "SSCD cosine similarity of CIFAR-10 PF-ODE samples".
Researcher Affiliation	Collaboration	1University of British Columbia 2Inverted AI 3Alberta Machine Intelligence Institute. Correspondence to: Matthew Niedoba <EMAIL>.
Pseudocode	No	The paper describes methods and processes in narrative text and figures, such as Figure 5 which illustrates the Patch Set Posterior Composite, but it does not present any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Our open-source implementation of PSPC and other denoisers is available at https://github.com/plai-group/pspc.
Open Datasets	Yes	We evaluate our method on three image datasets CIFAR-10 (Krizhevsky et al., 2009), FFHQ 64x64 (Karras et al., 2019), and AFHQv2 64x64 (Choi et al., 2020).
Dataset Splits	No	The paper mentions generating evaluation sets of samples (10,000 z for CIFAR-10 and 2000 z for FFHQ/AFHQ) for forward and reverse processes, but does not provide specific training, validation, or test splits for the original datasets (CIFAR-10, FFHQ, AFHQ) or how the 200 million examples for Di T training were used in terms of splits.
Hardware Specification	No	The Acknowledgements section mentions computational resources provided by the Digital Research Alliance of Canada Compute Canada (alliancecan.ca), the Advanced Research Computing at the University of British Columbia (arc.ubc.ca), and Amazon. However, it does not specify any particular GPU models, CPU types, or other hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer but does not provide specific software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment environment.
Experiment Setup	Yes	Appendix A.1, Table 2, titled 'Hyperparameters for Di T training on CIFAR-10', provides specific values for Batch Size (512), Learning Rate (0.0001), β1 (0.9), β2 (0.999), ϵ (1E-8), Patch Size (4), # Heads (12), Hidden Size (768), Transformer Blocks (12), Dropout Ratio (0.12), and Augmentation Rate (0.12).