Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Auto Speculation
Authors: Hengyuan Hu, Aniket Das, Dorsa Sadigh, Nima Anari
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical contributions with extensive empirical evaluations on diffusion models for image generation and robot control tasks. ASD leads to 1.8-4 speedup in wall-clock time without any loss in quality. In the experiments, we empirically demonstrate the practical benefits of autospeculative decoding (ASD). We consider a diverse set of real-world applications where diffusion models are used, including image generation with both latent and pixel-space diffusion models (Rombach et al., 2022; Ho et al., 2020) and robot control with diffusion policies (Reuss et al., 2023; Chi et al., 2023). |
| Researcher Affiliation | Academia | 1Computer Science Department, Stanford University, California, USA. Correspondence to: Hengyuan Hu <EMAIL>, Aniket Das <EMAIL>. |
| Pseudocode | Yes | Algorithm 1: Autospeculative Decoding (ASD) Algorithm 2: Verifier Algorithm 3: Gaussian Rejection Sampler (GRS) |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code. It mentions using open-sourced models/libraries like Stable Diffusion-v2 from the diffusers library, but this refers to tools they utilized, not their own implementation code. |
| Open Datasets | Yes | We use open-sourced Stable Diffusion-v2 (Rombach et al., 2022; Schuhmann et al., 2022) model from the diffusers (von Platen et al., 2022) library... using 5000 images generated with languages from COCO2017 captions validation dataset. ...We also evaluate ASD on the LSUN Church model from Ho et al. (2020)... We consider three hard Robomimic (Mandlekar et al., 2021) simulation environments namely Square, Transport and Tool Hang. |
| Dataset Splits | Yes | The CLIP scores are computed over 1000 captions from the COCO2017 captions validation dataset. Each score is computed with 5000 image samples. In each environment, we evaluate the same diffusion policy with different sampling schemes over the same set of 100 seeds (100 random initial configurations) and repeat three times. |
| Hardware Specification | Yes | We measure the wall-clock speedup on a machine with 8 NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper mentions using the 'diffusers (von Platen et al., 2022) library' but does not specify a version number. No other specific software versions are provided. |
| Experiment Setup | Yes | Fig. 2 shows the algorithmic and wall-clock speedup of ASD over DDPM under 1000 denoising steps. We evaluate ASD with different speculation length θ... We follow prior works to set k = 16 in all environments. ...vanilla DDPM that runs for 100 steps. Emperically, we find that ASD has a much higher acceptance rate for the speculated samples in these cases, leading to a 6-7 algorithmic speedup for ASD. Due to the high acceptance rate, it requires a larger speculation length of 20 or 24 to match the efficiency of ASD. |