Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions

Authors: Jinyoung Choi, Junoh Kang, Bohyung Han

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a series of experiments, we show that the proposed method improves the quality of generated samples without requiring additional sampling iterations. Our experiments across various well-known baselines demonstrate that RX-DPM exhibits strong generalization performance and high practicality, regardless of ODE designs, model architectures, and base samplers. We conduct the experiment with EDM (Karras et al., 2022), Stable Diffusion V21 (Rombach et al., 2022), DPM-Solver (Lu et al., 2022), and PNDM (Liu et al., 2022) using their official implementations and provided pretrained models. For experiments with EDM, DPM-Solver, and PNDM as backbones, we generate 50K images and compute FID (Heusel et al., 2017) using the evaluation code provided in their implementations. To evaluate Stable Diffusion V2 results, we use the Py Torch implementation for the computation of FID2 and CLIP score3 with the patch size of 32 32.
Researcher Affiliation Academia Jinyoung Choi1, Junoh Kang1 & Bohyung Han1,2 Computer Vision Lab., 1ECE & 2IPAI, Seoul National University, Korea EMAIL
Pseudocode Yes Algorithm 1 summarizes the procedure of the proposed method with a generic ODE solver under the assumption that N is a multitude of k for simplicity; it is simple to handle the last few steps by either adjusting k for the remaining steps or skipping the extrapolation.
Open Source Code Yes The full implementation is available at https://github.com/jin01020/rx-dpm.
Open Datasets Yes We compare RX-Euler with other methods on four different datasets CIFAR-10 (Krizhevsky & Hinton, 2009), FFHQ (Karras et al., 2019), AFHQv2 (Choi et al., 2020), and Image Net (Deng et al., 2009) using the EDM (Karras et al., 2022) backbone. For evaluation, we generate 10K 512 512 images from unique text prompts in the COCO2014 (Lin et al., 2014) validation set and compute FID and CLIP scores on resized 256 256 images. Table 2 presents the effectiveness of RX-DPM when applied to DPM-Solvers (Lu et al., 2022) on CIFAR-10 and LSUN Bedroom (Yu et al., 2015). The results on the CIFAR-10, Celeb A (Liu et al., 2015), and LSUN Church (Yu et al., 2015) datasets are presented in Table 3.
Dataset Splits Yes For experiments with EDM, DPM-Solver, and PNDM as backbones, we generate 50K images and compute FID (Heusel et al., 2017) using the evaluation code provided in their implementations. To evaluate Stable Diffusion V2 results, we use the Py Torch implementation for the computation of FID2 and CLIP score3 with the patch size of 32 32. For evaluation, we generate 10K 512 512 images from unique text prompts in the COCO2014 (Lin et al., 2014) validation set and compute FID and CLIP scores on resized 256 256 images.
Hardware Specification Yes For measurements, we set the batch size to 128 and use 10-step sampling on an A6000 GPU.
Software Dependencies No We conduct the experiment with EDM (Karras et al., 2022), Stable Diffusion V21 (Rombach et al., 2022), DPM-Solver (Lu et al., 2022), and PNDM (Liu et al., 2022) using their official implementations and provided pretrained models. To evaluate Stable Diffusion V2 results, we use the Py Torch implementation for the computation of FID2 and CLIP score3 with the patch size of 32 32. No specific software versions are provided for PyTorch or other libraries used in the official implementations.
Experiment Setup Yes Throughout all experiments, we retain the default settings from the official codebases, except for additional hyperparameters related to the proposed method. For measurements, we set the batch size to 128 and use 10-step sampling on an A6000 GPU.