Improved Convergence Rate for Diffusion Probabilistic Models

Authors: Gen Li, Yuchen Jiao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Towards to close this gap, we establish an iteration complexity at the order of d1/3ε 2/3, which is better than d5/12ε 1, the best known complexity achieved before our work. This convergence analysis is based on a randomized midpoint method, which is first proposed for log-concave sampling (Shen & Lee, 2019), and then extended to diffusion models by Gupta et al. (2024). Our theory accommodates ε-accurate score estimates, and does not require log-concavity on the target distribution. Moreover, the algorithm can also be parallelized to run in only O(log2(d/ε)) parallel rounds in a similar way to prior works.
Researcher Affiliation Academia Gen Li Department of Statistics The Chinese University of Hong Kong Hong Kong EMAIL Yuchen Jiao Department of Statistics The Chinese University of Hong Kong Hong Kong EMAIL
Pseudocode Yes 3.1 ALGORITHM This part is devoted to explaining the details of our score-based sampler with randomized midpoints. Before proceeding, let s first specify the choice of learning rates to be used in our sampler. Randomized schedule. Similar to prior works, we adopt the following randomized learning rate schedule... Sampling procedure. With the learning schedule in hand, we are now ready to introduce the sampling procedure. The algorithm is actually a discretization of the probability ODE flow incorporated with some stochastic noise, which proceeds as follows. We start from Y0 N(0, Id), and then for k = 0, . . . , K 1, we keep updating Yk through the formula: 1 τk,N Yk,N + τk+1,0 τk,N 1 τk,N Zk, (11a) where Zk i.i.d. N(0, I) and for n = 1, . . . , N, we compute Yk,n p1 τk,n = Yk p1 τk,0 + s T k N 2(1 τk,0)3/2 (τk,0 bτk,0) + 2 i+1(Yk,i) 2(1 τk,i)3/2 (bτk,i 1 bτk,i) 2 n+2(Yk,n 1) 2(1 τk,n 1)3/2 (bτk,n 1 τk,n). (11b)
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to any code repositories.
Open Datasets No The paper is theoretical and does not conduct experiments requiring datasets. While it mentions 'CIFAR-10' and 'Image Net 256 256' as examples in the introduction, these are for illustrative purposes and not used in the paper's analysis.
Dataset Splits No The paper focuses on theoretical analysis and does not involve empirical evaluation with datasets, thus no dataset splits are provided.
Hardware Specification No The paper is theoretical and does not report on experimental results that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not detail an experimental setup requiring specific software versions. No software dependencies with version numbers are listed.
Experiment Setup No The paper is primarily theoretical, focusing on convergence rates and mathematical analysis. It does not describe an experimental setup with hyperparameters or training configurations.