Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion

Authors: Enrico Ventura, Beatrice Achilli, Gianluigi Silvestri, Carlo Lucibello, Luca Ambrogioni

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main contributions are: I) an in-depth theoretical random-matrix analysis of the distribution of Jacobian spectra in diffusion models on linear manifolds and II) a detailed experimental analysis of Jacobian spectra extracted from trained networks on linear manifolds and on image datasets. The analysis of these spectra is important as it provides a detailed picture of the latent geometry that guides the generative diffusion process. We show that the linear theory predicts several phenomena that we observed in trained networks.
Researcher Affiliation Academia 1Department of Computing Sciences, BIDSA, Bocconi University, Milan, MI 20100, Italy. 2Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HD Nijmegen, the Netherlands. 3One Planet Research Center, imec-the Netherlands, Wageningen, the Netherlands.
Pseudocode Yes Algorithm 1 Estimate singular values at x0 Algorithm 2 Estimate singular values at x0 with central difference
Open Source Code No The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology.
Open Datasets Yes Fig. 5 shows the temporal evolution of the spectrum estimated numerically from the Jacobian of models trained on MNIST, Cifar10 and Celeb A.
Dataset Splits No For each data-set, the number of used training data-points amounts to the full set of data available. For the linear models, we used a Variance Exploding continuous score model trained with 2M steps (batch size 128).
Hardware Specification Yes For all experiments we primarily utilized NVIDIA Tesla V100 GPUs with 32 GB of memory.
Software Dependencies No The paper mentions using a "Pixel CNN++" model architecture but does not specify any software libraries or frameworks with version numbers for reproducibility.
Experiment Setup Yes We use the variance scheduler with βmin = 10^-4 and βmin = 2 * 10^-2, T = 1000 time steps, and score model backbone (Pixel CNN++ (Salimans et al., 2017)). Furthermore, for each of the datasets, we adjusted the partameters to account for the different complexity (see Table 1). For the linear models, we used a Variance Exploding continuous score model trained with 2M steps (batch size 128). The model had a Residual architecture with size 128 hidden channels in each layer, two residual blocks comprised by two linear layers with Si Lu.