Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models
Authors: Yuchen Yuchen, Xiangzhong Fang, Hanting Chen, Yunhe Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our approach on various resolution datasets in both pixel space and latent space. The Distilled Linear Multistep Solver (DLMS) significantly surpasses previous handcrafted and search-based solvers. Compared to handcrafted solvers, DLMS achieves a 2 sampling acceleration ratio. With just 5 NFE, we achieve FID scores of 3.23 on CIFAR10, 7.16 on Image Net-64, 5.44 on LSUN-Bedroom, and 12.52 on MS-COCO, resulting in a 2 sampling acceleration ratio compared to handcrafted solvers. |
| Researcher Affiliation | Collaboration | Yuchen Liang, Xiangzhong Fang School of Mathematical Sciences Peking University EMAIL Hanting Chen, Yunhe Wang Huawei Noah s Ark Lab EMAIL |
| Pseudocode | Yes | Algorithm 1 Linear Multistep Solver Distillation Algorithm 2 Distilled Solver Sampling |
| Open Source Code | No | All results are obtained from an open-source toolbox1, utilizing the recommended settings from the original papers. (Footnote 1: https://github.com/zju-pi/diff-sampler.) |
| Open Datasets | Yes | We conducted experiments across multiple datasets with resolutions ranging from 32 to 512... The sampling on CIFAR10 (Krizhevsky et al., 2009) 32 32 , FFHQ (Karras et al., 2019) 64 64, Image Net-64 (Deng et al., 2009) 64 64 is based on the pretrained pixel-space diffusion model provided by EDM (Karras et al., 2022). The unconditional sampling on LSUN-Bedroom (Yu et al., 2015) 256 256, is based on the pretrained latent-space diffusion model provided by Latent Diffusion (Rombach et al., 2022). The text-to-image sampling on MS-COCO (2014) (Lin et al., 2014) 512 512, is based on the pretrained latent-space diffusion model provided by Stable Diffusion v1.5 (Rombach et al., 2022). |
| Dataset Splits | Yes | We measure sample quality using the FID score calculated on 50k generated images. The distillation times are approximately 40 NFE seconds, 80 NFE seconds, and 150 NFE seconds, respectively, on 8 NVIDIA V100 GPUs. We measure sample quality using the FID score calculated on 50k generated images. The distillation times are approximately 3 NFE mins on 8 NVIDIA V100 GPUs. We measure sample quality using the FID score calculated on 50k generated images. We measure sample quality using the FID score calculated on 10k generated images. We measure sample quality using the FID score calculated on 30k generated images generated by 30k prompts from the MS-COCO validation set. |
| Hardware Specification | Yes | Our framework has the ability to complete a solver distillation for Stable-Diffusion in less than 1.5h on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | We use Adam as the optimizer with a learning rate of 5 10 3. We use DPM-Solver++ (Lu et al., 2022b) to generate ground truth trajectories. We initialized the prediction coefficients with PLMS (Zhang & Chen, 2022; Liu et al., 2022). |
| Experiment Setup | Yes | We uniformly use the noise schedule αt = 1, σt = t from Karras et al. (2022). We initialized the prediction coefficients with PLMS (Zhang & Chen, 2022; Liu et al., 2022), using a uniform time schedule (Ho et al., 2020) and time scaling factors of 1. We use DPM-Solver++ (Lu et al., 2022b) to generate ground truth trajectories. The designer network gϕ consists of a two-layer MLP with a total parameter count of only 9k. We use Adam as the optimizer with a learning rate of 5 10 3. The order p for student solver DLMS is set to 4. The number of interpolation time steps M is set to 4. |