Towards a Mechanistic Explanation of Diffusion Model Generalization
Authors: Matthew Niedoba, Berend Zwartsenberg, Kevin Patrick Murphy, Frank Wood
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operations, as these operations approximate the training objective well over much of the training distribution. To validate our hypothesis, we introduce novel denoising algorithms which aggregate local empirical denoisers to replicate network behaviour. Comparing these algorithms to network denoisers across forward and reverse diffusion processes, our approach exhibits consistent visual similarity to neural network outputs, with lower mean squared error than previously proposed methods. Figure 1 shows "Denoiser outputs given shared reverse process noisy inputs from CIFAR-10". Figure 2 plots the mean squared error (MSE) between network and optimal denoisers. Figure 7 presents "Comparison of various denoisers against DDPM++ over forward and reverse processes". Figure 9 shows "SSCD cosine similarity of CIFAR-10 PF-ODE samples". |
| Researcher Affiliation | Collaboration | 1University of British Columbia 2Inverted AI 3Alberta Machine Intelligence Institute. Correspondence to: Matthew Niedoba <EMAIL>. |
| Pseudocode | No | The paper describes methods and processes in narrative text and figures, such as Figure 5 which illustrates the Patch Set Posterior Composite, but it does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our open-source implementation of PSPC and other denoisers is available at https://github.com/plai-group/pspc. |
| Open Datasets | Yes | We evaluate our method on three image datasets CIFAR-10 (Krizhevsky et al., 2009), FFHQ 64x64 (Karras et al., 2019), and AFHQv2 64x64 (Choi et al., 2020). |
| Dataset Splits | No | The paper mentions generating evaluation sets of samples (10,000 z for CIFAR-10 and 2000 z for FFHQ/AFHQ) for forward and reverse processes, but does not provide specific training, validation, or test splits for the original datasets (CIFAR-10, FFHQ, AFHQ) or how the 200 million examples for Di T training were used in terms of splits. |
| Hardware Specification | No | The Acknowledgements section mentions computational resources provided by the Digital Research Alliance of Canada Compute Canada (alliancecan.ca), the Advanced Research Computing at the University of British Columbia (arc.ubc.ca), and Amazon. However, it does not specify any particular GPU models, CPU types, or other hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not provide specific software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment environment. |
| Experiment Setup | Yes | Appendix A.1, Table 2, titled 'Hyperparameters for Di T training on CIFAR-10', provides specific values for Batch Size (512), Learning Rate (0.0001), β1 (0.9), β2 (0.999), ϵ (1E-8), Patch Size (4), # Heads (12), Hidden Size (768), Transformer Blocks (12), Dropout Ratio (0.12), and Augmentation Rate (0.12). |