An optimal control perspective on diffusion-based generative modeling

Authors: Julius Berner, Lorenz Richter, Karen Ullrich

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples. Our experiments show that DIS can offer improvements over PIS for all the tasks we considered, i.e., estimation of normalizing constants, expectations, and standard deviations. Figures 4 and 5 display direct comparisons for the examples in Section 4.1
Researcher Affiliation Collaboration Julius Berner EMAIL Caltech Lorenz Richter EMAIL Zuse Institute Berlin dida Datenschmiede Gmb H Karen Ullrich EMAIL Meta AI
Pseudocode Yes Algorithm 1 Time-reversed diffusion sampler (DIS): Training the control in Section 3 via deep learning. input neural network uθ with initial parameters θ(0), optimizer method step for updating the parameters, number of steps K, batch size m output parameters (θ(k))K k=1 for k 0, . . . , K 1 do (x(i))m i=1 sample from N(0, I) m b LDIS(uθ(k)) estimate the cost in (19) using the EM scheme with b Xθ 0 = x(i), i = 1, . . . , m θ(k+1) step θ(k), b LDIS(uθ(k))
Open Source Code Yes The associated repository can be found at https://github.com/juliusberner/sde_sampler.
Open Datasets No Gaussian mixture model (GMM): We consider m=1 αm N(x; eµm, Σm), with PM m=1 αm = 1. Specifically, we choose m = 9, Σm = 0.3 I, and (eµm)9 m=1 = { 5, 0, 5} { 5, 0, 5} R2. Funnel: The 10-dimensional Funnel distribution (Neal, 2003) is a challenging example often used to test MCMC methods. It is given by ρ(x) = N(x1; 0, ν2) i=2 N(xi; 0, ex1) for x = (xi)10 i=1 R10 with8 ν = 3. Double well (DW): A typical problem in molecular dynamics considers sampling from the stationary distribution of a Langevin dynamics, where the drift of the SDE is given by the negative gradient of a potential Ψ, namely d Xs = Ψ(Xs) ds + σ(s) d Bs, see, e.g., (Leimkuhler & Matthews, 2015). Given certain assumptions on the potential, the stationary density of the process can be shown to be p X = e Ψ/Z.
Dataset Splits No The paper defines mathematical distributions (Gaussian mixture model, Funnel, Double well) and trains models to sample from them. It describes the characteristics of these distributions but does not use or specify traditional training/test/validation dataset splits of a pre-existing dataset.
Hardware Specification Yes GPU Tesla V100 (32 GiB)
Software Dependencies No framework Py Torch (Paszke et al., 2019). While Py Torch is mentioned as the framework and cited, a specific version number for the library used (e.g., PyTorch 1.9) is not provided.
Experiment Setup Yes Table 2: Hyperparameters DIS SDE inference SDE (corresponding to Y ) Variance-Preserving SDE with linear schedule (Song et al., 2021) min. diffusivity σmin 0.1 max. diffusivity σmax 10 terminal time T 1 initial distribution Xu 0 N(0, I) (truncated to 99.99% of mass) PIS SDE (Zhang & Chen, 2022a) uncontrolled process X0 scaled Brownian motion drift f 0 (constant) diffusivity 0.2 (constant) terminal time T 5 initial distribution Dirac delta δ0 at the origin SDE Solver type Euler-Maruyama (Kloeden & Platen, 1992) steps N (see also figure descriptions) [100, 200, 400, 800] (each for 1/4 of the total gradient steps K) Training optimizer Adam (Kingma & Ba, 2014) weight decay 10 7 learning rate 0.005 batch size m 2048 gradient clipping 1 (ℓ2-norm) clip Φ(1) θ , s [ c, c], Φ(2) θ [ c, c]d c = 10 (step 200), c = 50 (200 < step 400), c = 250 (else) gradient steps K 20000 (d 10), 80000 (else) framework Py Torch (Paszke et al., 2019) GPU Tesla V100 (32 Gi B) number of seeds 10