Inference-Time Alignment of Diffusion Models with Direct Noise Optimization
Authors: Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on several important reward functions and demonstrate that the proposed DNO approach can achieve state-of-the-art reward scores within a reasonable time budget for generation. |
| Researcher Affiliation | Collaboration | 1School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China 2DAMO Academy, Alibaba Group 3Department of Electrical and Computer Engineering, University of Minnesota, USA 4Hupan Lab, Zhejiang Province, China 5Shenzhen Research Institute of Big Data. |
| Pseudocode | Yes | A. DDIM Sampling Algorithm. In Algorithm 1 described below, we summarize the sampling algorithm for diffusion models, DDIM (Song et al., 2020a), which is essentially the Euler method for solving ODEs/SDEs. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It discusses implementation details but does not include a repository link or an explicit statement of code release. |
| Open Datasets | No | The paper mentions using a 'simple animals prompt dataset' following (Clark et al., 2023; Black et al., 2023) but does not provide concrete access information (link, DOI, repository, or specific citation for the dataset itself). |
| Dataset Splits | No | The paper discusses evaluating generated samples using reward functions and a 'simple animals prompt dataset', but it does not specify any training/test/validation splits for a dataset used in their experiments. |
| Hardware Specification | Yes | For all the following experiments, unless explicitly stated otherwise, a single run of DNO is performed on a single A800 GPU. |
| Software Dependencies | No | The paper mentions PyTorch (Paszke et al., 2019) as a tool used, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In this experiment, to solve the probability-regularized noise optimization problem as formulated in Equation (5), we employ the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.01. For optimization with regularization, we set the regularization coefficient γ to 1. To compute the minibatch stochastic gradient for the regularization term in Equation (5), we set the batch size b the number of random permutations drawn at each step to 100. Additionally, in Appendix B.4, the paper states: "We adopt the DDIM sampler with 50 steps and η = 1 for generation, and optimize all the injected noise in the generation process, the same as most experiments in this work. The classifier-free guidance is set to 5.0." |