Variational Control for Guidance in Diffusion Models
Authors: Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, Stephan Mandt
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While our method serves as a general framework for guidance in diffusion models, here, we focus on solving inverse problems and style-guided generation. Through both quantitative and qualitative results, we demonstrate that our approach outperforms recent state-of-the-art baselines across these tasks using pretrained diffusion models. Lastly, we emphasize key design parameters of our proposed method as ablations. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of California, Irvine 2Chan-Zuckerberg Initiative 3Department of Statistics, University of California, Irvine. |
| Pseudocode | Yes | We provide a visual illustration of the NDTM algorithm in Fig. 1a and its pseudocode implementation in Algorithm 1. |
| Open Source Code | Yes | Our code will be available at https: //github.com/czi-ai/oc-guidance. |
| Open Datasets | Yes | For inverse problems, we conduct experiments on the FFHQ (256 256) (Karras et al., 2019) and Image Net (256 256) (Deng et al., 2009) datasets, using a held-out validation set of 1, 000 samples from each. |
| Dataset Splits | Yes | For inverse problems, we conduct experiments on the FFHQ (256 256) (Karras et al., 2019) and Image Net (256 256) (Deng et al., 2009) datasets, using a held-out validation set of 1, 000 samples from each. For style guidance, following MPGD (He et al., 2024), we randomly generate 1k (prompt, image) pairs using images from Wiki Art (Saleh & Elgammal, 2015) and prompts from Parti Prompt (Yu et al., 2022). With the exception of BID (for which we use 100 images), we evaluate all other inverse problems on 1k images. |
| Hardware Specification | Yes | The runtime numbers are in wall-clock time (seconds) and tested on a single RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions the Adam optimizer (Kingma & Ba, 2017) and other baselines, but does not specify version numbers for any software libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma & Ba, 2017) with default hyperparameters, fixing the learning rate to 0.01 for updating the control ut across all tasks and fixing the kernel learning rate in the BID task to 0.01. We refer to the loss weighting scheme in Eq. 16 as "DDIM weighting". Moreover, we use linear decay for the learning rate. We perform 50 diffusion steps using the DDIM sampler across all datasets and tasks. We tune the guidance weight µ, the number of optimization steps N, loss weighting (w T , ws, wc), DDIM and the truncation time Ä (Chung et al., 2022b) for best performance across different tasks. All these hyperparameters are listed in Table 9. |