Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Auto Speculation

Authors: Hengyuan Hu, Aniket Das, Dorsa Sadigh, Nima Anari

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical contributions with extensive empirical evaluations on diffusion models for image generation and robot control tasks. ASD leads to 1.8-4 speedup in wall-clock time without any loss in quality. In the experiments, we empirically demonstrate the practical benefits of autospeculative decoding (ASD). We consider a diverse set of real-world applications where diffusion models are used, including image generation with both latent and pixel-space diffusion models (Rombach et al., 2022; Ho et al., 2020) and robot control with diffusion policies (Reuss et al., 2023; Chi et al., 2023).
Researcher Affiliation Academia 1Computer Science Department, Stanford University, California, USA. Correspondence to: Hengyuan Hu <EMAIL>, Aniket Das <EMAIL>.
Pseudocode Yes Algorithm 1: Autospeculative Decoding (ASD) Algorithm 2: Verifier Algorithm 3: Gaussian Rejection Sampler (GRS)
Open Source Code No The paper does not provide an explicit statement or link to its own open-source code. It mentions using open-sourced models/libraries like Stable Diffusion-v2 from the diffusers library, but this refers to tools they utilized, not their own implementation code.
Open Datasets Yes We use open-sourced Stable Diffusion-v2 (Rombach et al., 2022; Schuhmann et al., 2022) model from the diffusers (von Platen et al., 2022) library... using 5000 images generated with languages from COCO2017 captions validation dataset. ...We also evaluate ASD on the LSUN Church model from Ho et al. (2020)... We consider three hard Robomimic (Mandlekar et al., 2021) simulation environments namely Square, Transport and Tool Hang.
Dataset Splits Yes The CLIP scores are computed over 1000 captions from the COCO2017 captions validation dataset. Each score is computed with 5000 image samples. In each environment, we evaluate the same diffusion policy with different sampling schemes over the same set of 100 seeds (100 random initial configurations) and repeat three times.
Hardware Specification Yes We measure the wall-clock speedup on a machine with 8 NVIDIA A40 GPUs.
Software Dependencies No The paper mentions using the 'diffusers (von Platen et al., 2022) library' but does not specify a version number. No other specific software versions are provided.
Experiment Setup Yes Fig. 2 shows the algorithmic and wall-clock speedup of ASD over DDPM under 1000 denoising steps. We evaluate ASD with different speculation length θ... We follow prior works to set k = 16 in all environments. ...vanilla DDPM that runs for 100 steps. Emperically, we find that ASD has a much higher acceptance rate for the speculated samples in these cases, leading to a 6-7 algorithmic speedup for ASD. Due to the high acceptance rate, it requires a larger speculation length of 20 or 24 to match the efficiency of ASD.