Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

Authors: Chao Li, Jiawei Fan, Anbang Yao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method shows a lossless speedup of 1.78 to 3.31 on average over a wide range of sampling step budgets relative to 9 baseline diffusion models on 6 image generation tasks. Furthermore, we show that our method can be also generalized to improve the Latent Consistency Model (LCM-SDXL, which is already accelerated with consistency distillation technique) tailored for few-step text-to-image synthesis. The code and models are available at https://github.com/deepoptimization/Morse. In the experiments, we evaluate our Morse with the mainstream samplers, including DDPM (Ho et al., 2020), DDIM (Song et al., 2021a), DPM-Solver (Lu et al., 2022) for discrete samplers and SDE (Song et al., 2021b), DPM-Solver on SDE for continuous samplers. We conduct the experiments with CIFAR-10 (Krizhevsky, 2009) benchmark, which is adopted by all the above samplers for experiments. As shown in Fig. 4, our Morse can accelerate DMs consistently with all the samplers under different LSDs ranging from 3 to 100, achieving average speedups ranging from 2.01 to 2.94 .
Researcher Affiliation Industry 1Intel Labs China. Correspondence to: Anbang Yao <EMAIL>.
Pseudocode Yes Algorithm 1: Training of Dot with DDIM. Algorithm 2: DDIM Sampling with Morse.
Open Source Code Yes The code and models are available at https://github.com/deepoptimization/Morse.
Open Datasets Yes In the experiments, we evaluate our Morse with the mainstream samplers... We conduct the experiments with CIFAR-10 (Krizhevsky, 2009) benchmark... we further evaluate our Morse with 5 popular image generation benchmarks, including CIFAR-10 (32 32) (Krizhevsky, 2009), Image Net (64 64) (Russakovsky et al., 2015), Celeb A (64 64) (Liu et al., 2015), Celeb A-HQ (256 256) and LSUN-Church (256 256) (Yu et al., 2015). For text-to-image generation, we select the Stable Diffusion v1.4 as our Dash model, which is pre-trained with around 2 billion text-image pairs from LAION-5B dataset (Schuhmann et al., 2022)... we use the 30000 generated samples with the prompts from the MS-COCO (Lin et al., 2014) validation set for evaluation.
Dataset Splits Yes we use the 30000 generated samples with the prompts from the MS-COCO (Lin et al., 2014) validation set for evaluation... For each DM, we generate 50000 samples and calculate the FID score between the generated images and the images of the corresponding benchmark... we evaluate the text-to-image diffusion models under zero-shot text-to-image generation on the MS-COCO 2014 validation set (Lin et al., 2014) (256 256).
Hardware Specification Yes All the speeds for different models are tested using an NVIDIA GeForce RTX 3090. All the experiments are performed on a server having 8 NVIDIA Tesla V100 GPUs. The speeds are tested with the batch size of 20 on a single NVIDIA RTX 3090 GPU. The Dot models are trained on the servers with 8 NVIDIA Tesla V100 GPUs or 8 NVIDIA GeForce RTX 4090 GPUs.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers, such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes We typically set the number of extra down-sampling blocks and up-sampling blocks m to 2, leading to N in the range of 5 to 10. All the experiments are performed on the servers having 8 NVIDIA GeForce RTX 3090 GPUs. When testing the speeds, we set the batch size to 100 for most benchmarks except 20 for Celeb A-HQ and MS-COCO benchmarks. In our experiments, the Dot model is trained with only about 2M text-image pairs at resolution 512 512 sampled from the LAION-5B dataset for 100,000 iterations. We use DDIM as the sampler. We set the rank of Lo RA to 64. during the training procedure, we also randomly sample the guidance scales between 2 and 10. The Dot model is trained with about 2M text-image pairs at resolution 1024 1024 from the LAION-5B dataset for 100,000 iterations.