reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models

Authors: Yuchen Yuchen, Xiangzhong Fang, Hanting Chen, Yunhe Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate our approach on various resolution datasets in both pixel space and latent space. The Distilled Linear Multistep Solver (DLMS) significantly surpasses previous handcrafted and search-based solvers. Compared to handcrafted solvers, DLMS achieves a 2 sampling acceleration ratio. With just 5 NFE, we achieve FID scores of 3.23 on CIFAR10, 7.16 on Image Net-64, 5.44 on LSUN-Bedroom, and 12.52 on MS-COCO, resulting in a 2 sampling acceleration ratio compared to handcrafted solvers.
Researcher Affiliation	Collaboration	Yuchen Liang, Xiangzhong Fang School of Mathematical Sciences Peking University EMAIL Hanting Chen, Yunhe Wang Huawei Noah s Ark Lab EMAIL
Pseudocode	Yes	Algorithm 1 Linear Multistep Solver Distillation Algorithm 2 Distilled Solver Sampling
Open Source Code	No	All results are obtained from an open-source toolbox1, utilizing the recommended settings from the original papers. (Footnote 1: https://github.com/zju-pi/diff-sampler.)
Open Datasets	Yes	We conducted experiments across multiple datasets with resolutions ranging from 32 to 512... The sampling on CIFAR10 (Krizhevsky et al., 2009) 32 32 , FFHQ (Karras et al., 2019) 64 64, Image Net-64 (Deng et al., 2009) 64 64 is based on the pretrained pixel-space diffusion model provided by EDM (Karras et al., 2022). The unconditional sampling on LSUN-Bedroom (Yu et al., 2015) 256 256, is based on the pretrained latent-space diffusion model provided by Latent Diffusion (Rombach et al., 2022). The text-to-image sampling on MS-COCO (2014) (Lin et al., 2014) 512 512, is based on the pretrained latent-space diffusion model provided by Stable Diffusion v1.5 (Rombach et al., 2022).
Dataset Splits	Yes	We measure sample quality using the FID score calculated on 50k generated images. The distillation times are approximately 40 NFE seconds, 80 NFE seconds, and 150 NFE seconds, respectively, on 8 NVIDIA V100 GPUs. We measure sample quality using the FID score calculated on 50k generated images. The distillation times are approximately 3 NFE mins on 8 NVIDIA V100 GPUs. We measure sample quality using the FID score calculated on 50k generated images. We measure sample quality using the FID score calculated on 10k generated images. We measure sample quality using the FID score calculated on 30k generated images generated by 30k prompts from the MS-COCO validation set.
Hardware Specification	Yes	Our framework has the ability to complete a solver distillation for Stable-Diffusion in less than 1.5h on 8 NVIDIA V100 GPUs.
Software Dependencies	No	We use Adam as the optimizer with a learning rate of 5 10 3. We use DPM-Solver++ (Lu et al., 2022b) to generate ground truth trajectories. We initialized the prediction coefficients with PLMS (Zhang & Chen, 2022; Liu et al., 2022).
Experiment Setup	Yes	We uniformly use the noise schedule αt = 1, σt = t from Karras et al. (2022). We initialized the prediction coefficients with PLMS (Zhang & Chen, 2022; Liu et al., 2022), using a uniform time schedule (Ho et al., 2020) and time scaling factors of 1. We use DPM-Solver++ (Lu et al., 2022b) to generate ground truth trajectories. The designer network gϕ consists of a two-layer MLP with a total parameter count of only 9k. We use Adam as the optimizer with a learning rate of 5 10 3. The order p for student solver DLMS is set to 4. The number of interpolation time steps M is set to 4.