Consistency Models Made Easy
Authors: Zhengyang Geng, Ashwini Pokle, Weijian Luo, Justin Lin, Zico Kolter
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the efficiency and scalability of ECT on two datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net 64 64 (Deng et al., 2009). We measure the sample quality using Fréchet Inception Distance (FID) (Heusel et al., 2017) and Fréchet Distance under the DINOv2 model (Oquab et al., 2023) (FDDINOv2) (Stein et al., 2024) and sampling efficiency using the number of function evaluations (NFEs). |
| Researcher Affiliation | Academia | Zhengyang Geng1 Ashwini Pokle1 Weijian Luo2 Justin Lin1 J. Zico Kolter1 1CMU 2Peking University |
| Pseudocode | Yes | Algorithm 1 Easy Consistency Tuning (ECT) Algorithm 2 Easy Consistency Distillation (ECD) |
| Open Source Code | Yes | Our code is available. Our code is available for future prototyping, studying, and deploying consistency models within the community. |
| Open Datasets | Yes | We evaluate the efficiency and scalability of ECT on two datasets: CIFAR-10 (Krizhevsky, 2009) and Image Net 64 64 (Deng et al., 2009). We further extend our method to latent space and conducted experiments on Image Net 512 512. |
| Dataset Splits | No | The paper mentions a "budget of 12.8M training images" and evaluates metrics like FID on 50k sampled images, but does not explicitly specify how the input datasets (CIFAR-10, ImageNet 64x64, ImageNet 512x512) are split into training, validation, and test sets for reproducibility purposes beyond implicitly using standard splits. The text "We train multiple ECMs with different choices of batch sizes and training iterations. By default, ECT utilizes a batch size of 128 and 100k iterations, leading to a training budget of 12.8M on Image Net 64 64." refers to the training data quantity but not specific train/validation/test splits. |
| Hardware Specification | Yes | ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU Table 8: Number of GPUs 1, GPU types A6000, GPU types H100 |
| Software Dependencies | No | We use RAdam (Liu et al., 2019) optimizer for experiments on CIFAR-10 and Adam (Kingma and Ba, 2014) optimizer for experiments on Image Net 64 64. We set the β to (0.9, 0.999) for CIFAR-10 and (0.9, 0.99) for Image Net 64 64. While specific optimizers are mentioned and cited, the paper does not provide version numbers for general software components (e.g., Python, PyTorch, CUDA, operating system). |
| Experiment Setup | Yes | Table 8: Model Channels, Minibatch size, Iterations, Dropout probability, Dropout feature resolution, Learning rate max (αref), Learning rate decay (tref), EMA beta, Regular Weighting ( w(t)), Adaptive Weighting (w( )), Adaptive Weighting Smoothing (c), Noise distribution mean (Pmean), Noise distribution std (Pstd). We set the β to (0.9, 0.999) for CIFAR-10 and (0.9, 0.99) for Image Net 64 64. All the hyperparameters are indicated in Tab. 8. We do not use any learning rate decay, weight decay, or warmup on CIFAR-10. We follow EDM2 (Karras et al., 2024) to apply an inverse square root learning rate decay schedule on Image Net 64 64. |