Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation
Authors: Bowen Zheng, Tianming Yang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct the ablation experiments in which different sets of parameters are frozen or tunable during training (Table 3). The results indicate that freezing most of the convolutional layers leads to the best performance. ... Finally, we conduct a comprehensive evaluation of both D2O and D2O-F on CIFAR-10 (Krizhevsky, 2009), AFHQv2 64x64 (Choi et al., 2020), FFHQ 64x64 (Karras et al., 2019), and Image Net 64x64 (Deng et al., 2009). The results for class-conditional generation and the comparison against previous studies are reported on CIFAR-10 and Image Net 64x64. The pre-trained diffusion models are from EDM. We report FID for all datasets, Inception Score (IS) (Salimans et al., 2016) for CIFAR-10, precision and recall metric (Kynk a anniemi et al., 2019) for Image Net 64x64 following previous works. ... The results can be found in Table 4, 5 , and 6. |
| Researcher Affiliation | Academia | 1 Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. Correspondence to: Tianming Yang <EMAIL>. |
| Pseudocode | No | The paper describes the methods and processes through textual explanations and mathematical equations, such as in Section 2.1 'Diffusion Models' and Appendix B 'Definition of Typical Distillation Methods', but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements regarding the release of source code or links to code repositories. |
| Open Datasets | Yes | Finally, we conduct a comprehensive evaluation of both D2O and D2O-F on CIFAR-10 (Krizhevsky, 2009), AFHQv2 64x64 (Choi et al., 2020), FFHQ 64x64 (Karras et al., 2019), and Image Net 64x64 (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions training data sizes like '5M training images' and '0.2 million training images' (Table 10), and evaluates FID using '50,000 images generated by each model with fixed seeds'. However, it does not explicitly provide specific train/validation/test splits (e.g., percentages or absolute counts) for the datasets used to train the models. |
| Hardware Specification | Yes | All models are trained on a cluster of NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'Mixed precision training... with BFloat16 data type', but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | Hyperparameters used for D2O and D2O-F training can be found in Table 10. This table specifies details such as G Architecture, G LR, D LR, Optimizer (Adam with β1=0, β2=0.99), Batch size, γr1 (Regularization), EMA half-life, EMA warmup ratio, Mixed-precision (BF16), Dropout probability, and Augmentations, all tailored for different datasets. |