reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Classifier-Free Guidance is a Predictor-Corrector

Authors: Arwen Bradley, Preetum Nakkiran

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For demonstration purposes, we implement the PCG sampler for Stable Diffusion XL and observe that it produces samples qualitatively similar to CFG, with guidance scales determined by our theory. Further, we explore the design axes exposed by the PCG framework, namely guidance strength and Langevin iterations, to clarify their respective effects. ... Table 1 shows FID scores (Heusel et al., 2017) on Image Net (Russakovsky et al., 2015), using EDM2 pretrained diffusion models (Karras et al., 2024b).
Researcher Affiliation	Industry	Arwen Bradley Apple, Cupertino CA, USA; Preetum Nakkiran Apple, Cupertino CA, USA
Pseudocode	Yes	Algorithm 1 PCGDDIM, Theory. (See Algorithm 2 for practical implementation.) ... Algorithm 2 PCGDDIM, explicit.
Open Source Code	No	No concrete access to source code is provided. The paper states: 'We do not intend to propose PCG as a practical sampling method (since with certain parameters it is equivalent to CFG, but far less efficient), but rather as a tool for understanding CFG.'
Open Datasets	Yes	Table 1 shows FID scores (Heusel et al., 2017) on Image Net (Russakovsky et al., 2015), using EDM2 pretrained diffusion models (Karras et al., 2024b).
Dataset Splits	No	No specific dataset split information is provided. The paper states: 'Metrics are calculated using 50,000 samples and 200 sampling steps, generated using EDM2 checkpoints...'
Hardware Specification	No	No specific hardware details are mentioned for running the experiments. The paper refers to using 'EDM2 pretrained diffusion models' but does not specify the hardware on which their experiments were conducted.
Software Dependencies	No	No specific ancillary software details with version numbers are provided. The paper mentions using 'EDM2 pretrained diffusion models' but no other software dependencies or versions.
Experiment Setup	Yes	We run CFGDDPM with 200 denoising steps, and PCGDDIM with 100 denoising steps and K = 1 Langevin step per denoising step. Corresponding samples appear to have qualitatively similar guidance strengths, consistent with our theory. ... All samples used 1000 denoising steps for the base predictor. Overall, we observed that increasing Langevin steps tends to improve the overall image quality, while increasing guidance strength tends to improve prompt adherence.