Generalization through variance: how noise shapes inductive biases in diffusion models

Authors: John Vastola

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we develop a mathematical theory that partly explains this generalization through variance phenomenon. Our theoretical analysis exploits a physics-inspired path integral approach to compute the distributions typically learned by a few paradigmatic under- and overparameterized diffusion models. We find that the distributions diffusion models effectively learn to sample from resemble their training distributions, but with gaps filled in, and that this inductive bias is due to the covariance structure of the noisy target used during training.
Researcher Affiliation Academia John J. Vastola Department of Neurobiology Harvard Medical School Boston, MA 02115, USA EMAIL
Pseudocode No The paper describes mathematical derivations and theoretical concepts in prose and equations. It does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes See https://github.com/john-vastola/gtv-iclr25 for code that produces Fig. 1-3.
Open Datasets No The paper mentions 'CIFAR-10' and 'Image Net-64' as datasets used by other state-of-the-art models for context, but its own theoretical analysis and illustrative figures (Fig. 1-3) use synthetic or toy data ('four example 2D data distributions', 'a 1D data distribution { 1, 0, 1}', 'a 2D data distribution'). No concrete access information for these specific datasets is provided, nor are the well-known datasets used for the paper's own results.
Dataset Splits No The paper's results are based on theoretical analysis and illustrative examples using synthetic data. Standard training/test/validation splits are not mentioned for the data distributions used to generate the figures.
Hardware Specification No The paper does not mention any specific hardware (like GPU or CPU models) used for conducting its theoretical analyses or generating the illustrative figures.
Software Dependencies No The paper does not explicitly state any software dependencies with specific version numbers. While code is provided, the paper text itself lacks this information.
Experiment Setup No The paper describes theoretical models and their parameters (e.g., 'N = 100' for linear models, 'Gaussian features', 'Fourier features', 'time cutoff ϵ and ratio F/P'). However, these are parameters of the theoretical analysis and illustrative examples, not concrete hyperparameters or training configurations for a machine learning experiment that would typically be described in an 'experimental setup' section for reproducibility.