The Geometry of Phase Transitions in Diffusion Models: Tubular Neighbourhoods and Singularities

Authors: Manato Yaguchi, Kotaro Sakamoto, Ryosuke Sakamoto, Masato Tanabe, Masatomo Akagawa, Yusuke Hayashi, Masahiro Suzuki, Yutaka Matsuo

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test our hypothesis, we conduct experiments using synthetic data and demonstrate that, under conditions of constant curvature, the hypothesis holds true. In contrast, in scenarios where the curvature of the data manifold is non-constant, singularities corresponding to varying curvatures can emerge, leading to the possibility of multiple phase transitions. Moreover, we show that the concept of the tubular neighbourhood corresponds to the final phase transition in the generative process. Finally, we experimentally demonstrate that by embedding the original data distribution into a hypersurface, the theory of the tubular neighbourhood can be leveraged to achieve more efficient sampling. Our code can be found at https://github.com/yagumana/lateinit. In this section, we empirically demonstrate the presence of phase transitions at the boundary of the tubular neighbourhood during the generative process of diffusion models.
Researcher Affiliation Collaboration Manato Yaguchi EMAIL Graduate School of Engineering, The University of Tokyo Tokyo, Japan Kotaro Sakamoto EMAIL Graduate School of Engineering, The University of Tokyo Tokyo, Japan Ryosuke Sakamoto EMAIL Department of Mathematics, Hokkaido University Sapporo, Japan Yusuke Hayashi EMAIL AI Alignment Network Tokyo, Japan Humanity Brain Tokyo, Japan
Pseudocode Yes As a core part of our geometric approach, we propose an algorithm to estimate the injectivity radius of the data manifold (Section 3). This allows us to define and quantify the tubular neighbourhood, providing a practical tool for characterizing the manifold s geometric properties in diffusion models. (see Algorithm 1 in Appendix F).
Open Source Code Yes Our code can be found at https://github.com/yagumana/lateinit.
Open Datasets Yes see Figure 2 for a CIFAR-10 example at around step 400. To address this limitation, we embed each dataset into a unit hypersphere using a Hyperspherical VAE (Davidson et al., 2018). This procedure maps all data points onto S20 in R24, thereby constraining their global geometry to a fixed radius. While this embedding does not yield a uniform distribution on the hypersphere, it simplifies the analysis of phase transitions by ensuring that large-scale curvature effects are more tractable. The detailed setup of the Hyperspherical VAE is provided in Appendix J.7.
Dataset Splits No The training dataset consists of 50,000 points sampled from a uniform distribution. The model is trained without using any advanced samplers like DDIM, relying solely on the standard DDPM reverse process. In our approach, several key modifications were made to the original hyperspherical VAE (s VAE) (Davidson et al., 2018) setup used in prior studies. One significant change was transitioning from binary data representation, where data was handled as binary values, to continuous data representation.
Hardware Specification No The model is trained using the mean squared error (MSE) loss function, with Adam W as the optimizer. The learning rate is set to 1 × 10−3, and the batch size is 32. For toy data experiments, the training dataset consists of 50,000 points sampled from a uniform distribution. The model is trained without using any advanced samplers like DDIM, relying solely on the standard DDPM reverse process. No specific hardware details were provided in the paper.
Software Dependencies No We employ the DDPM framework (Ho et al., 2020) with T = 1000 diffusion steps in the forward and reverse processes. All experiments share the same neural network architecture for score estimation and use the Adam optimizer. We use the POT library to solve this optimal transport problem in practice. No specific version numbers for software dependencies are provided in the paper.
Experiment Setup Yes Diffusion Model Setup. We employ the DDPM framework (Ho et al., 2020) with T = 1000 diffusion steps in the forward and reverse processes. Let {x0, x1, . . . , x T } denote a trajectory under Gaussian noise perturbation; we follow the standard parametrisation where x T N(0, I) and each intermediate xt is obtained by adding Gaussian noise according to a fixed variance schedule. All experiments share the same neural network architecture for score estimation and use the Adam optimizer. The batch size, learning rates, and number of training iterations are kept consistent across experiments unless stated otherwise. Additional hyperparameter details are provided in Appendix J. In previous studies (Raya & Ambrogioni (2023)), the training of diffusion models was performed using DDPM. The number of time steps is set to 1000, and the noise schedule coefficient β linearly increases from 1.0 × 10−4 to 2.0 × 10−2. A key difference from prior work is that, for denoising, the MLP layers have been replaced with a 1D U-Net. ... The learning rate is set to 1 × 10−3, and the batch size is 32.