reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Authors: Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges, Romann Weber

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we rigorously evaluate ICG and demonstrate its ability to simulate the behavior of CFG across several conditional models. Additionally, we show that TSG improves the quality of both conditional and unconditional generations compared to the non-guided sampling baseline. Setup All experiments are conducted via pre-trained checkpoints provided by official implementations. We use the recommended sampler that comes with each model, such as the EDM sampler for EDM networks (Karras et al., 2022), DPM++ (Lu et al., 2022b) for Stable Diffusion (Rombach et al., 2022), and DDPM (Ho et al., 2020) for Di T-XL/2 (Peebles & Xie, 2022). Evaluation We use Fréchet Inception Distance (FID) (Heusel et al., 2017) as the main metric to measure both quality and diversity due to its alignment with human judgment. As FID is known to be sensitive to small implementation details, we ensure that models under comparison follow the same evaluation setup. For completeness, we also report precision (Kynkäänniemi et al., 2019) as a standalone quality metric and recall (Kynkäänniemi et al., 2019) as a diversity metric whenever possible. FDDINOv2 (Stein et al., 2024) is also reported for the EDM2 model (Karras et al., 2023). 6.1 COMPARISON BETWEEN ICG AND CFG 6.2 EFFECTIVENESS OF TIME-STEP GUIDANCE 7 ABLATION STUDIES
Researcher Affiliation	Collaboration	1ETH Zürich, 2Disney Research\|Studios EMAIL {romann.weber}@disneyresearch.com
Pseudocode	Yes	The exact algorithms for ICG and TSG are provided in Algorithms 1 and 2, with corresponding pseudocode shown in Figures 11 and 12.
Open Source Code	No	The paper provides algorithms and pseudocode (Algorithms 1 and 2, Figures 11 and 12) and states that additional implementation details and hyperparameters are discussed in Appendix G, but it does not include an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	For text-to-image models, we use the evaluation subset of MS COCO 2017 (Lin et al., 2014) as the ground truth for captions and images.
Dataset Splits	Yes	For class-conditional models, the FID is computed between 10,000 (for Di T-XL/2) or 50,000 (For EDM and EDM2) generated images and the full training dataset. For text-to-image models, we use the evaluation subset of MS COCO 2017 (Lin et al., 2014) as the ground truth for captions and images.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper states: 'All experiments are conducted via pre-trained checkpoints provided by official implementations.' However, it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The hyperparameters used in our experiments are listed in Tables 10 and 11. Table 10: Hyperparameters used for the ICG experiments. Model ICG mode ICG scale CFG scale Di T-XL/2 Random class 1.4 1.5 Stable Diffusion Random text 3.0 4.0 Pose-to-Image Gaussian noise 3.0 4.0 MDM Gaussian noise 2.5 2.5 EDM Random class 1.05 1.1 EDM2 Random class 1.25 1.25 Table 11: Hyperparameters used for the TSG experiments. Model Mode TSG function TSG scale TSG parameters Di T-XL/2 Unconditional constant_schedule 5.0 T_MIN = 200, T_MAX = 800, s = 1.0 Di T-XL/2 Conditional power_schedule 2.5 T_MIN = 0, T_MAX = 1000, α = 1, s = 2 Stable Diffusion Unconditional constant_schedule 3.0 T_MIN = 100, T_MAX = 900, s = 1.25 Stable Diffusion Conditional power_schedule 4.0 T_MIN = 400, T_MAX = 1000, s = 3.0, α = 0.25