Simplifying, Stabilizing and Scaling Continuous-time Consistency Models
Authors: Cheng Lu, Yang Song
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on Image Net 64 64, and 1.88 on Image Net 512 512, narrowing the gap in FID scores with the best existing diffusion models to within 10%. ... We train s CMs on CIFAR-10, Image Net 64 64, and Image Net 512 512, reaching an unprecedented scale with 1.5 billion parameters the largest CMs trained to date (samples in Figure 2). ... In Tables 1 and 2, we compare our results with previous methods by benchmarking the FIDs and the number of function evaluations (NFEs). |
| Researcher Affiliation | Industry | Cheng Lu & Yang Song Open AI |
| Pseudocode | Yes | Algorithm 1 Simplified and Stabilized Continuous-time Consistency Models (s CM). |
| Open Source Code | No | The paper does not explicitly state that the authors are releasing their source code for the methodology described in this paper. While it refers to 'Alpha-VLLM. Large-DiT-ImageNet. https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/f7fe19834b23e38f333403b91bb0330afe19f79e/Large-DiT-ImageNet, 2024.' in the bibliography, this appears to be a reference to a third-party tool or platform they used, not their own implementation code. |
| Open Datasets | Yes | Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on Image Net 64 64, and 1.88 on Image Net 512 512, narrowing the gap in FID scores with the best existing diffusion models to within 10%. (Krizhevsky, 2009) (Deng et al., 2009) |
| Dataset Splits | Yes | We train s CMs on CIFAR-10, Image Net 64 64, and Image Net 512 512... CIFAR-10. Our architecture is based on the Score SDE (Song et al., 2021b) architecture (DDPM++). We use the same settings of EDM (Karras et al., 2022)... Image Net 64 64. We preprocess the Image Net dataset following Dhariwal & Nichol (2021)... Image Net 512 512. We preprocess the Image Net dataset following Dhariwal & Nichol (2021) and Karras et al. (2024)... The use of well-known benchmark datasets like CIFAR-10 and ImageNet, and adherence to settings from prior works, implies the use of their standard, well-defined dataset splits. |
| Hardware Specification | No | The paper mentions training with 'half-precision (FP16)' and using 'Flash Attention' which are related to GPU capabilities, and states 'saving the GPU memory'. However, it does not provide specific details such as exact GPU models, CPU models, or processor types used for running the experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch (Paszke et al., 2019) auto-grad' in Appendix F, but it does not specify a version number for PyTorch or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | CIFAR-10. ... dropout rate is 0.13, batch size is 512, number of training iterations is 400k, learning rate is 0.001, Adam ϵ = 10 8, β1 = 0.9, β2 = 0.999. ... Image Net 64 64. ... use Adam ϵ = 10 11. ... Image Net 512 512. ... use Adam ϵ = 10 11. We enable label dropout with rate 0.1 to support classifier-free guidance. ... Tables 4 and 5 in the appendix provide detailed training settings for different model sizes and datasets, including batch size, channel multiplier, learning rate, dropout probability, proposal parameters (Pmean, Pstd), tangent normalization constant, tangent warmup iterations, and EMA length. |