SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Authors: Hongjian Liu, Qingsong Xie, Tianxiang Ye, Zhijie Deng, Chen Chen, Shixiang Tang, Xueyang Fu, Haonan Lu, Zheng-Jun Zha

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps, surpassing that of the 1-step Insta Flow (23.4) and the 4-step UFOGen (22.1). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation, with up to 16% improvement in a qualified metric.
Researcher Affiliation Collaboration Hongjian Liu1*, Qingsong Xie2* , Tianxiang Ye3, Zhijie Deng3 , Chen Chen2, Shixiang Tang4, Xueyang Fu1, Haonan Lu2, Zheng-Jun Zha1 1 University of Science and Technology of China, China 2 OPPO AI Center 3 Shanghai Jiao Tong University, China 4 The Chinese University of Hong Kong
Pseudocode No The paper describes the methodology in prose and through mathematical equations and diagrams (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks in the main text.
Open Source Code No The paper does not provide an explicit statement about the release of their source code or a link to a code repository.
Open Datasets Yes Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps... We use LAION-Aesthetics-6+ dataset (Schuhmann et al. 2022).
Dataset Splits Yes Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps... On MSCOCO2017 5K validation dataset with a Stable Diffusion-V1.5 (SD1.5) (Rombach et al. 2022) teacher, our 2-step method achieves an FID (Heusel et al. 2017) of 21.9... Comparison on MSCOCO-2014 30K... Comparison on MJHQ-5K validation dataset...
Hardware Specification Yes We train SCott with 4 A100 GPUs and a batch size of 40 for 40K iterations.
Software Dependencies No The paper does not list specific software dependencies with version numbers, such as Python or PyTorch versions, that would be needed to replicate the experiment.
Experiment Setup Yes We train SCott with 4 A100 GPUs and a batch size of 40 for 40K iterations. The learning rate is 8e-6 for SCott and 2e-5 for the discriminator. In practice, we set λadv = 0.4 to control the strength of the discriminator for refining the outputs of fθ. Empirically, we set tm = tn 24 and h = 3.