Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
Authors: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that AG preserves CFG s image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter, while being training-free and retaining the capacity to handle negative prompts. |
| Researcher Affiliation | Collaboration | 1Center for Research and Formation in Artificial Intelligence, Universidad de los Andes 2Gen AI, Meta 3King Abdullah University of Science and Technology (KAUST) |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but it does not include a distinct section or figure explicitly labeled as 'Pseudocode' or 'Algorithm' with structured steps. |
| Open Source Code | No | The paper includes a footnote stating 'Find more at bcv-uniandes.github.io/adaptiveguidance-wp/', which points to a project webpage. However, it does not explicitly state that the source code for the methodology described in the paper is released there, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Training involves 10k noise-image pairs from CC3M (Sharma et al. 2018) with T = 20 DPM++ (Lu et al. 2022a) steps and a guidance strength of s = 7.5. Evaluation metrics are computed on 1k realworld user prompts from the OUI dataset (Dai et al. 2023), which represents diverse, practical user inputs. |
| Dataset Splits | No | The paper mentions 'Training involves 10k noise-image pairs from CC3M' and 'Evaluation metrics are computed on 1k realworld user prompts from the OUI dataset'. While it specifies the amount of data used for training and evaluation from these datasets, it does not provide explicit details about how these datasets were split into training, validation, or test sets in terms of percentages, absolute counts for each split, or references to standard predefined splits for their specific experiments. |
| Hardware Specification | Yes | The search took approximately 1.5 days on a Quadro RTX 8000. Model GPU AG (30 NFEs) CFG (40 NFEs) AG Gain EMU 9.5b H100 3251 26 3822 31 15% EMU 2.7b A100 2634 8 3184 6 17% SD XL 2.5b V100 4584 8 5876 10 22% SD XL 2.5b RTX 8000 4932 5 6339 6 22% SD 1.5 0.8b GTX 1080Ti 4396 27 5542 30 21% |
| Software Dependencies | Yes | All models run in half-precision in PyTorch 2. |
| Experiment Setup | Yes | Experimental Setup. We optimize guidance policies for text-to-image generation using the Stable Diffusion architecture (Rombach et al. 2022), referred to as LDM-512. We train LDM-512 from scratch on a commissioned dataset, consisting of 900M parameters and generating 512 512 resolution images from a 4 64 64 latent space. Training involves 10k noise-image pairs from CC3M (Sharma et al. 2018) with T = 20 DPM++ (Lu et al. 2022a) steps and a guidance strength of s = 7.5. ... We optimize Eq. (6) using the Lion optimizer (Chen et al. 2023) for 5 epochs. |