reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Feature Learning in Diffusion Models

Authors: Andi Han, Wei Huang, Yuan Cao, Difan Zou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analysis demonstrates that diffusion models, due to the denoising objective, are encouraged to learn more balanced and comprehensive representations of the data. In contrast, neural networks with a similar architecture trained for classification tend to prioritize learning specific patterns in the data, often focusing on easy-to-learn components. To support these theoretical insights, we conduct several experiments on both synthetic and real-world datasets, which empirically validate our findings and highlight the distinct feature learning dynamics in diffusion models compared to classification.
Researcher Affiliation	Academia	RIKEN AIP (EMAIL, EMAIL). Equal contribution. Department of Statistics and Actuarial Science, University of Hong Kong (EMAIL) Department of Computer Science and Institute of Data Science, University of Hong Kong (EMAIL)
Pseudocode	No	The paper describes theoretical frameworks and experimental setups but does not include any clearly labeled pseudocode or algorithm blocks. It presents mathematical derivations and logical steps in paragraph form, particularly in the proof overview and appendix sections, rather than structured algorithm displays.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository in the main text or appendix.
Open Datasets	Yes	We conduct both synthetic and real-world experiments to verify our theoretical claims. ... We also conduct experiments on the MNIST dataset (Lecun et al., 1998) to support our theory. In order to better control the signal-to-noise ratio, we create a Noisy-MNIST dataset, where we treat each original MNIST image as a clean signal patch and concatenate a standard Gaussian noise patch with the same size, i.e., 28 × 28.
Dataset Splits	Yes	Setup. We follow Definition 2.1 to generate a synthetic dataset for both diffusion model and classification. Specifically, we set data dimension d = 1000 and let µ1 = [µ, 0, . . . , 0] Rd and µ−1 = [0, µ, 0, . . . , 0] Rd. We sample the noise patch ξi ∼ N(0, Id), i ∈ [n] (i.e., σξ = 1). We set sample size and network width to be n = 30 and m = 20... The (in-distribution) test accuracy is computed with 3000 test samples. ... We also conduct experiments on the MNIST dataset (Lecun et al., 1998) to support our theory. In order to better control the signal-to-noise ratio, we create a Noisy-MNIST dataset... We select 50 samples each from digit 0 and 1 respectively (i.e., n = 100).
Hardware Specification	No	The paper describes the experimental methodology and setup but does not specify any particular hardware used for running the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies	No	The paper mentions the use of 'gradient descent' and 'neural networks' but does not provide specific version numbers for any software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used in the implementation.
Experiment Setup	Yes	We set sample size and network width to be n = 30 and m = 20 and initialize the weights to be Gaussian with a standard deviation σ0 = 0.001. We vary the choice of µ to create two problem settings: (1) low SNR with µ = 5, which leads to n SNR2 = 0.75 and (2) high SNR with µ = 15, which leads to n SNR2 = 6.75. We use the same two-layer networks introduced in Section 2. For classification, we set a learning rate of η = 0.1 and train for 500 iterations. For diffusion model, we minimize the DDPM loss by averaging over the diffusion noise, following the standard training of diffusion model. In particular, for each sample, we samples nϵ = 2000 noise at each iteration and the loss is calculated by taking an average over the noise. For the noise coefficients, we consider a time t = 0.2 and set αt = exp(−t) = 0.82 and βt = √1 − exp(−2t) = 0.57. For diffusion model, we set η = 0.5 and train for 40000 iterations.