reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation

Authors: Bowen Zheng, Tianming Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct the ablation experiments in which different sets of parameters are frozen or tunable during training (Table 3). The results indicate that freezing most of the convolutional layers leads to the best performance. ... Finally, we conduct a comprehensive evaluation of both D2O and D2O-F on CIFAR-10 (Krizhevsky, 2009), AFHQv2 64x64 (Choi et al., 2020), FFHQ 64x64 (Karras et al., 2019), and Image Net 64x64 (Deng et al., 2009). The results for class-conditional generation and the comparison against previous studies are reported on CIFAR-10 and Image Net 64x64. The pre-trained diffusion models are from EDM. We report FID for all datasets, Inception Score (IS) (Salimans et al., 2016) for CIFAR-10, precision and recall metric (Kynk a anniemi et al., 2019) for Image Net 64x64 following previous works. ... The results can be found in Table 4, 5 , and 6.
Researcher Affiliation	Academia	1 Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. Correspondence to: Tianming Yang <EMAIL>.
Pseudocode	No	The paper describes the methods and processes through textual explanations and mathematical equations, such as in Section 2.1 'Diffusion Models' and Appendix B 'Definition of Typical Distillation Methods', but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements regarding the release of source code or links to code repositories.
Open Datasets	Yes	Finally, we conduct a comprehensive evaluation of both D2O and D2O-F on CIFAR-10 (Krizhevsky, 2009), AFHQv2 64x64 (Choi et al., 2020), FFHQ 64x64 (Karras et al., 2019), and Image Net 64x64 (Deng et al., 2009).
Dataset Splits	No	The paper mentions training data sizes like '5M training images' and '0.2 million training images' (Table 10), and evaluates FID using '50,000 images generated by each model with fixed seeds'. However, it does not explicitly provide specific train/validation/test splits (e.g., percentages or absolute counts) for the datasets used to train the models.
Hardware Specification	Yes	All models are trained on a cluster of NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'Mixed precision training... with BFloat16 data type', but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	Hyperparameters used for D2O and D2O-F training can be found in Table 10. This table specifies details such as G Architecture, G LR, D LR, Optimizer (Adam with β1=0, β2=0.99), Batch size, γr1 (Regularization), EMA half-life, EMA warmup ratio, Mixed-precision (BF16), Dropout probability, and Augmentations, all tailored for different datasets.