reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that AG preserves CFG s image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter, while being training-free and retaining the capacity to handle negative prompts.
Researcher Affiliation	Collaboration	1Center for Research and Formation in Artificial Intelligence, Universidad de los Andes 2Gen AI, Meta 3King Abdullah University of Science and Technology (KAUST)
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but it does not include a distinct section or figure explicitly labeled as 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code	No	The paper includes a footnote stating 'Find more at bcv-uniandes.github.io/adaptiveguidance-wp/', which points to a project webpage. However, it does not explicitly state that the source code for the methodology described in the paper is released there, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Training involves 10k noise-image pairs from CC3M (Sharma et al. 2018) with T = 20 DPM++ (Lu et al. 2022a) steps and a guidance strength of s = 7.5. Evaluation metrics are computed on 1k realworld user prompts from the OUI dataset (Dai et al. 2023), which represents diverse, practical user inputs.
Dataset Splits	No	The paper mentions 'Training involves 10k noise-image pairs from CC3M' and 'Evaluation metrics are computed on 1k realworld user prompts from the OUI dataset'. While it specifies the amount of data used for training and evaluation from these datasets, it does not provide explicit details about how these datasets were split into training, validation, or test sets in terms of percentages, absolute counts for each split, or references to standard predefined splits for their specific experiments.
Hardware Specification	Yes	The search took approximately 1.5 days on a Quadro RTX 8000. Model GPU AG (30 NFEs) CFG (40 NFEs) AG Gain EMU 9.5b H100 3251 26 3822 31 15% EMU 2.7b A100 2634 8 3184 6 17% SD XL 2.5b V100 4584 8 5876 10 22% SD XL 2.5b RTX 8000 4932 5 6339 6 22% SD 1.5 0.8b GTX 1080Ti 4396 27 5542 30 21%
Software Dependencies	Yes	All models run in half-precision in PyTorch 2.
Experiment Setup	Yes	Experimental Setup. We optimize guidance policies for text-to-image generation using the Stable Diffusion architecture (Rombach et al. 2022), referred to as LDM-512. We train LDM-512 from scratch on a commissioned dataset, consisting of 900M parameters and generating 512 512 resolution images from a 4 64 64 latent space. Training involves 10k noise-image pairs from CC3M (Sharma et al. 2018) with T = 20 DPM++ (Lu et al. 2022a) steps and a guidance strength of s = 7.5. ... We optimize Eq. (6) using the Lion optimizer (Chen et al. 2023) for 5 epochs.