reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simplifying, Stabilizing and Scaling Continuous-time Consistency Models

Authors: Cheng Lu, Yang Song

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on Image Net 64 64, and 1.88 on Image Net 512 512, narrowing the gap in FID scores with the best existing diffusion models to within 10%. ... We train s CMs on CIFAR-10, Image Net 64 64, and Image Net 512 512, reaching an unprecedented scale with 1.5 billion parameters the largest CMs trained to date (samples in Figure 2). ... In Tables 1 and 2, we compare our results with previous methods by benchmarking the FIDs and the number of function evaluations (NFEs).
Researcher Affiliation	Industry	Cheng Lu & Yang Song Open AI
Pseudocode	Yes	Algorithm 1 Simplified and Stabilized Continuous-time Consistency Models (s CM).
Open Source Code	No	The paper does not explicitly state that the authors are releasing their source code for the methodology described in this paper. While it refers to 'Alpha-VLLM. Large-DiT-ImageNet. https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/f7fe19834b23e38f333403b91bb0330afe19f79e/Large-DiT-ImageNet, 2024.' in the bibliography, this appears to be a reference to a third-party tool or platform they used, not their own implementation code.
Open Datasets	Yes	Our proposed training algorithm, using only two sampling steps, achieves FID scores of 2.06 on CIFAR-10, 1.48 on Image Net 64 64, and 1.88 on Image Net 512 512, narrowing the gap in FID scores with the best existing diffusion models to within 10%. (Krizhevsky, 2009) (Deng et al., 2009)
Dataset Splits	Yes	We train s CMs on CIFAR-10, Image Net 64 64, and Image Net 512 512... CIFAR-10. Our architecture is based on the Score SDE (Song et al., 2021b) architecture (DDPM++). We use the same settings of EDM (Karras et al., 2022)... Image Net 64 64. We preprocess the Image Net dataset following Dhariwal & Nichol (2021)... Image Net 512 512. We preprocess the Image Net dataset following Dhariwal & Nichol (2021) and Karras et al. (2024)... The use of well-known benchmark datasets like CIFAR-10 and ImageNet, and adherence to settings from prior works, implies the use of their standard, well-defined dataset splits.
Hardware Specification	No	The paper mentions training with 'half-precision (FP16)' and using 'Flash Attention' which are related to GPU capabilities, and states 'saving the GPU memory'. However, it does not provide specific details such as exact GPU models, CPU models, or processor types used for running the experiments.
Software Dependencies	No	The paper mentions 'PyTorch (Paszke et al., 2019) auto-grad' in Appendix F, but it does not specify a version number for PyTorch or any other software dependencies crucial for replication.
Experiment Setup	Yes	CIFAR-10. ... dropout rate is 0.13, batch size is 512, number of training iterations is 400k, learning rate is 0.001, Adam ϵ = 10 8, β1 = 0.9, β2 = 0.999. ... Image Net 64 64. ... use Adam ϵ = 10 11. ... Image Net 512 512. ... use Adam ϵ = 10 11. We enable label dropout with rate 0.1 to support classifier-free guidance. ... Tables 4 and 5 in the appendix provide detailed training settings for different model sizes and datasets, including batch size, channel multiplier, learning rate, dropout probability, proposal parameters (Pmean, Pstd), tangent normalization constant, tangent warmup iterations, and EMA length.