No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Authors: Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges, Romann Weber

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we rigorously evaluate ICG and demonstrate its ability to simulate the behavior of CFG across several conditional models. Additionally, we show that TSG improves the quality of both conditional and unconditional generations compared to the non-guided sampling baseline. Setup All experiments are conducted via pre-trained checkpoints provided by official implementations. We use the recommended sampler that comes with each model, such as the EDM sampler for EDM networks (Karras et al., 2022), DPM++ (Lu et al., 2022b) for Stable Diffusion (Rombach et al., 2022), and DDPM (Ho et al., 2020) for Di T-XL/2 (Peebles & Xie, 2022). Evaluation We use Fréchet Inception Distance (FID) (Heusel et al., 2017) as the main metric to measure both quality and diversity due to its alignment with human judgment. As FID is known to be sensitive to small implementation details, we ensure that models under comparison follow the same evaluation setup. For completeness, we also report precision (Kynkäänniemi et al., 2019) as a standalone quality metric and recall (Kynkäänniemi et al., 2019) as a diversity metric whenever possible. FDDINOv2 (Stein et al., 2024) is also reported for the EDM2 model (Karras et al., 2023). 6.1 COMPARISON BETWEEN ICG AND CFG 6.2 EFFECTIVENESS OF TIME-STEP GUIDANCE 7 ABLATION STUDIES
Researcher Affiliation Collaboration 1ETH Zürich, 2Disney Research|Studios EMAIL {romann.weber}@disneyresearch.com
Pseudocode Yes The exact algorithms for ICG and TSG are provided in Algorithms 1 and 2, with corresponding pseudocode shown in Figures 11 and 12.
Open Source Code No The paper provides algorithms and pseudocode (Algorithms 1 and 2, Figures 11 and 12) and states that additional implementation details and hyperparameters are discussed in Appendix G, but it does not include an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes For text-to-image models, we use the evaluation subset of MS COCO 2017 (Lin et al., 2014) as the ground truth for captions and images.
Dataset Splits Yes For class-conditional models, the FID is computed between 10,000 (for Di T-XL/2) or 50,000 (For EDM and EDM2) generated images and the full training dataset. For text-to-image models, we use the evaluation subset of MS COCO 2017 (Lin et al., 2014) as the ground truth for captions and images.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper states: 'All experiments are conducted via pre-trained checkpoints provided by official implementations.' However, it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The hyperparameters used in our experiments are listed in Tables 10 and 11. Table 10: Hyperparameters used for the ICG experiments. Model ICG mode ICG scale CFG scale Di T-XL/2 Random class 1.4 1.5 Stable Diffusion Random text 3.0 4.0 Pose-to-Image Gaussian noise 3.0 4.0 MDM Gaussian noise 2.5 2.5 EDM Random class 1.05 1.1 EDM2 Random class 1.25 1.25 Table 11: Hyperparameters used for the TSG experiments. Model Mode TSG function TSG scale TSG parameters Di T-XL/2 Unconditional constant_schedule 5.0 T_MIN = 200, T_MAX = 800, s = 1.0 Di T-XL/2 Conditional power_schedule 2.5 T_MIN = 0, T_MAX = 1000, α = 1, s = 2 Stable Diffusion Unconditional constant_schedule 3.0 T_MIN = 100, T_MAX = 900, s = 1.25 Stable Diffusion Conditional power_schedule 4.0 T_MIN = 400, T_MAX = 1000, s = 3.0, α = 0.25