Elucidating the Preconditioning in Consistency Distillation

Authors: Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of Analytic-Precond by applying it to CMs and CTMs on standard benchmark datasets, including CIFAR-10, FFHQ 64 64 and Image Net 64 64. While the vanilla preconditioning closely approximates Analytic-Precond and yields similar results in CMs, Analytic Precond exhibits notable distinctions from its original counterpart in CTMs, particularly concerning intermediate jumps on the trajectory. Remarkably, Analytic-Precond achieves 2 to 3 training acceleration in CTMs in multi-step generation across various datasets. Section 5: EXPERIMENTS. Figure 2: Training curves for single-step generation, and visualization of preconditionings for single-step jump on CIFAR-10 (conditional). Figure 3: Training curves for two-step generation. Table 2: FID results in multi-step generation with different number of function evaluations (NFEs).
Researcher Affiliation Collaboration Kaiwen Zheng 1 , Guande He 1 , Jianfei Chen1, Fan Bao12, Jun Zhu 123 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab 1Tsinghua-Bosch Joint ML Center, Tsinghua University, Beijing, China 2Shengshu Technology, Beijing 3Pazhou Lab (Huangpu), Guangzhou, China EMAIL; EMAIL; EMAIL; EMAIL
Pseudocode No The paper describes methods using mathematical equations and prose. It does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code No The paper does not provide an explicit statement from the authors about releasing their own source code for the methodology described, nor does it include a direct link to a repository containing their implementation. Table 4 lists code for related works (EDM, CM, CTM) but not for the current paper's contribution.
Open Datasets Yes We demonstrate the effectiveness of Analytic-Precond by applying it to CMs and CTMs on standard benchmark datasets, including CIFAR-10, FFHQ 64 64 and Image Net 64 64. (Krizhevsky, 2009), (Karras et al., 2019), (Deng et al., 2009). Table 4: The used datasets, codes and their licenses. CIFAR-10 https://www.cs.toronto.edu/~kriz/cifar.html (Krizhevsky et al., 2009) FFHQ https://github.com/NVlabs/ffhq-dataset (Karras et al., 2019) CC BY-NC-SA 4.0 Image Net https://www.image-net.org (Deng et al., 2009)
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages or exact counts) for the datasets used in their experiments. It mentions using '50K random samples' for FID evaluation but this refers to generated samples, not the original dataset splits. The paper states that 'The teacher models are the pretrained diffusion models on the corresponding dataset, provided by EDM', implying reliance on existing model setups rather than defining new splits.
Hardware Specification Yes We run the experiments on a cluster of NVIDIA A800 GPU cards. For CIFAR-10 (unconditional), we train the model with a batch size of 256 for 200K iterations, which takes 5 days on 4 GPU cards. For CIFAR-10 (conditional), we train the model with a batch size of 512 for 150K iterations, which takes 4 days on 8 GPU cards. For FFHQ 64 64 (unconditional), we train the model with a batch size of 256 for 150K iterations, which takes 5 days on 8 GPU cards. For Image Net 64 64 (conditional), we train the model with a batch size of 2048 for 60K iterations, which takes 8 days on 32 GPU cards.
Software Dependencies No The paper mentions 'automatic differentiation in modern deep learning frameworks' and refers to 'Heun sampler in EDM' and 'LPIPS (Zhang et al., 2018) as the distance metric', but it does not specify concrete version numbers for any of these software components (e.g., specific versions of PyTorch, TensorFlow, CUDA, or LPIPS library) that would be needed for reproducible setup.
Experiment Setup Yes Appendix B.2 TRAINING DETAILS. Table 3: Experimental configurations. Learning rate 0.0004. Student’s stop-grad EMA parameter 0.999. N 18 18 18 40. ODE solver Heun. Max. ODE steps 17 17 17 20. EMA decay rate 0.999 0.999 0.999 0.999. Training iterations 200K 150K 150K 60K. Mixed-Precision (FP16) True. Batch size 256 512 256 2048. We follow the hyperparameters used in EDM, setting σmin = ϵ = 0.002, σmax = T = 80.0, σdata = 0.5 and ρ = 7.