Learning to Discretize Denoising Diffusion ODEs

Authors: Vinh Tong, Trung-Dung Hoang, Anji Liu, Guy Van den Broeck, Mathias Niepert

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method with extensive experiments on 7 pre-trained models, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. We achieve FIDs of 2.38 (10 NFE), and 2.27 (10 NFE) on unconditional CIFAR10 and AFHQv2 in 5-10 minutes of training.
Researcher Affiliation Academia 1University of Stuttgart, 2IMPRS-IS, 3UCLA, 4University of Bern
Pseudocode Yes Algorithm 1 LD3
Open Source Code Yes Code is available at https://github.com/vinhsuhi/LD3.
Open Datasets Yes We evaluate 7 pre-trained diffusion models across different domains. For pixel space models, we include CIFAR10 (32 32) (Krizhevsky & Hinton, 2009), FFHQ (64 64) (Karras et al., 2019), and AFHQv2 (64 64) (Choi et al., 2020). For latent space models, we assess LSUNBedroom (256 256) (Yu et al., 2015) and class-conditional Image Net (256 256) (Russakovsky et al., 2015).
Dataset Splits Yes For CIFAR10, FFHQ, and AFHQv2, we use 100 samples for both training and validation and train LD3 for 7 epochs with a batch size of 2. ... For Latent Diffusion (Rombach et al., 2022) on Image Net and LSUN-Bedroom, we use 100 samples for both training and validation, with the training conducted over 5 epochs. ... We evaluate our model using the FID score with 50,000 randomly generated samples.
Hardware Specification Yes For instance, at 10 NFE, our model needs approximately 36 minutes on a single NVIDIA A100 GPU, whereas AYS requires 3 to 4 hours on 8 NVIDIA RTX6000s.
Software Dependencies No The paper mentions several solvers like DPM_solver++, Uni_PC, and i PNDM, and using codebases from other papers (e.g., Luo & Hu, 2021). However, it does not specify explicit version numbers for general software libraries or programming languages used in the authors' own implementation, such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes For CIFAR10, FFHQ, and AFHQv2, we use 100 samples for both training and validation and train LD3 for 7 epochs with a batch size of 2. We set r proportional to the dimensionality d and inversely proportional to the squared NFE: r = γ d NFE2 , where γ = 0.001 in all experiments. ... RMSprop for ξ and SGD for both ξc and x T . The learning rates are denoted as lξ, lξc, and lx T . We set lξ = 0.005 for pixel space datasets and lξ = 0.001 for latent space datasets, while lξc and lx T are NFE-dependent.