reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistency Models Made Easy

Authors: Zhengyang Geng, Ashwini Pokle, Weijian Luo, Justin Lin, Zico Kolter

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the efficiency and scalability of ECT on two datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net 64 64 (Deng et al., 2009). We measure the sample quality using Fréchet Inception Distance (FID) (Heusel et al., 2017) and Fréchet Distance under the DINOv2 model (Oquab et al., 2023) (FDDINOv2) (Stein et al., 2024) and sampling efficiency using the number of function evaluations (NFEs).
Researcher Affiliation	Academia	Zhengyang Geng1 Ashwini Pokle1 Weijian Luo2 Justin Lin1 J. Zico Kolter1 1CMU 2Peking University
Pseudocode	Yes	Algorithm 1 Easy Consistency Tuning (ECT) Algorithm 2 Easy Consistency Distillation (ECD)
Open Source Code	Yes	Our code is available. Our code is available for future prototyping, studying, and deploying consistency models within the community.
Open Datasets	Yes	We evaluate the efficiency and scalability of ECT on two datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net 64 64 (Deng et al., 2009). We further extend our method to latent space and conducted experiments on Image Net 512 512.
Dataset Splits	No	The paper mentions a "budget of 12.8M training images" and evaluates metrics like FID on 50k sampled images, but does not explicitly specify how the input datasets (CIFAR-10, ImageNet 64x64, ImageNet 512x512) are split into training, validation, and test sets for reproducibility purposes beyond implicitly using standard splits. The text "We train multiple ECMs with different choices of batch sizes and training iterations. By default, ECT utilizes a batch size of 128 and 100k iterations, leading to a training budget of 12.8M on Image Net 64 64." refers to the training data quantity but not specific train/validation/test splits.
Hardware Specification	Yes	ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU Table 8: Number of GPUs 1, GPU types A6000, GPU types H100
Software Dependencies	No	We use RAdam (Liu et al., 2019) optimizer for experiments on CIFAR-10 and Adam (Kingma and Ba, 2014) optimizer for experiments on Image Net 64 64. We set the β to (0.9, 0.999) for CIFAR-10 and (0.9, 0.99) for Image Net 64 64. While specific optimizers are mentioned and cited, the paper does not provide version numbers for general software components (e.g., Python, PyTorch, CUDA, operating system).
Experiment Setup	Yes	Table 8: Model Channels, Minibatch size, Iterations, Dropout probability, Dropout feature resolution, Learning rate max (αref), Learning rate decay (tref), EMA beta, Regular Weighting ( w(t)), Adaptive Weighting (w( )), Adaptive Weighting Smoothing (c), Noise distribution mean (Pmean), Noise distribution std (Pstd). We set the β to (0.9, 0.999) for CIFAR-10 and (0.9, 0.99) for Image Net 64 64. All the hyperparameters are indicated in Tab. 8. We do not use any learning rate decay, weight decay, or warmup on CIFAR-10. We follow EDM2 (Karras et al., 2024) to apply an inverse square root learning rate decay schedule on Image Net 64 64.