SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
Authors: Hongjian Liu, Qingsong Xie, Tianxiang Ye, Zhijie Deng, Chen Chen, Shixiang Tang, Xueyang Fu, Haonan Lu, Zheng-Jun Zha
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps, surpassing that of the 1-step Insta Flow (23.4) and the 4-step UFOGen (22.1). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation, with up to 16% improvement in a qualified metric. |
| Researcher Affiliation | Collaboration | Hongjian Liu1*, Qingsong Xie2* , Tianxiang Ye3, Zhijie Deng3 , Chen Chen2, Shixiang Tang4, Xueyang Fu1, Haonan Lu2, Zheng-Jun Zha1 1 University of Science and Technology of China, China 2 OPPO AI Center 3 Shanghai Jiao Tong University, China 4 The Chinese University of Hong Kong |
| Pseudocode | No | The paper describes the methodology in prose and through mathematical equations and diagrams (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks in the main text. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of their source code or a link to a code repository. |
| Open Datasets | Yes | Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps... We use LAION-Aesthetics-6+ dataset (Schuhmann et al. 2022). |
| Dataset Splits | Yes | Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9 with 2 sampling steps... On MSCOCO2017 5K validation dataset with a Stable Diffusion-V1.5 (SD1.5) (Rombach et al. 2022) teacher, our 2-step method achieves an FID (Heusel et al. 2017) of 21.9... Comparison on MSCOCO-2014 30K... Comparison on MJHQ-5K validation dataset... |
| Hardware Specification | Yes | We train SCott with 4 A100 GPUs and a batch size of 40 for 40K iterations. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as Python or PyTorch versions, that would be needed to replicate the experiment. |
| Experiment Setup | Yes | We train SCott with 4 A100 GPUs and a batch size of 40 for 40K iterations. The learning rate is 8e-6 for SCott and 2e-5 for the discriminator. In practice, we set λadv = 0.4 to control the strength of the discriminator for refining the outputs of fθ. Empirically, we set tm = tn 24 and h = 3. |