Learning to Discretize Denoising Diffusion ODEs
Authors: Vinh Tong, Trung-Dung Hoang, Anji Liu, Guy Van den Broeck, Mathias Niepert
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method with extensive experiments on 7 pre-trained models, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. We achieve FIDs of 2.38 (10 NFE), and 2.27 (10 NFE) on unconditional CIFAR10 and AFHQv2 in 5-10 minutes of training. |
| Researcher Affiliation | Academia | 1University of Stuttgart, 2IMPRS-IS, 3UCLA, 4University of Bern |
| Pseudocode | Yes | Algorithm 1 LD3 |
| Open Source Code | Yes | Code is available at https://github.com/vinhsuhi/LD3. |
| Open Datasets | Yes | We evaluate 7 pre-trained diffusion models across different domains. For pixel space models, we include CIFAR10 (32 32) (Krizhevsky & Hinton, 2009), FFHQ (64 64) (Karras et al., 2019), and AFHQv2 (64 64) (Choi et al., 2020). For latent space models, we assess LSUNBedroom (256 256) (Yu et al., 2015) and class-conditional Image Net (256 256) (Russakovsky et al., 2015). |
| Dataset Splits | Yes | For CIFAR10, FFHQ, and AFHQv2, we use 100 samples for both training and validation and train LD3 for 7 epochs with a batch size of 2. ... For Latent Diffusion (Rombach et al., 2022) on Image Net and LSUN-Bedroom, we use 100 samples for both training and validation, with the training conducted over 5 epochs. ... We evaluate our model using the FID score with 50,000 randomly generated samples. |
| Hardware Specification | Yes | For instance, at 10 NFE, our model needs approximately 36 minutes on a single NVIDIA A100 GPU, whereas AYS requires 3 to 4 hours on 8 NVIDIA RTX6000s. |
| Software Dependencies | No | The paper mentions several solvers like DPM_solver++, Uni_PC, and i PNDM, and using codebases from other papers (e.g., Luo & Hu, 2021). However, it does not specify explicit version numbers for general software libraries or programming languages used in the authors' own implementation, such as Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | For CIFAR10, FFHQ, and AFHQv2, we use 100 samples for both training and validation and train LD3 for 7 epochs with a batch size of 2. We set r proportional to the dimensionality d and inversely proportional to the squared NFE: r = γ d NFE2 , where γ = 0.001 in all experiments. ... RMSprop for ξ and SGD for both ξc and x T . The learning rates are denoted as lξ, lξc, and lx T . We set lξ = 0.005 for pixel space datasets and lξ = 0.001 for latent space datasets, while lξc and lx T are NFE-dependent. |