Inference-Time Alignment of Diffusion Models with Direct Noise Optimization

Authors: Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on several important reward functions and demonstrate that the proposed DNO approach can achieve state-of-the-art reward scores within a reasonable time budget for generation.
Researcher Affiliation Collaboration 1School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China 2DAMO Academy, Alibaba Group 3Department of Electrical and Computer Engineering, University of Minnesota, USA 4Hupan Lab, Zhejiang Province, China 5Shenzhen Research Institute of Big Data.
Pseudocode Yes A. DDIM Sampling Algorithm. In Algorithm 1 described below, we summarize the sampling algorithm for diffusion models, DDIM (Song et al., 2020a), which is essentially the Euler method for solving ODEs/SDEs.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It discusses implementation details but does not include a repository link or an explicit statement of code release.
Open Datasets No The paper mentions using a 'simple animals prompt dataset' following (Clark et al., 2023; Black et al., 2023) but does not provide concrete access information (link, DOI, repository, or specific citation for the dataset itself).
Dataset Splits No The paper discusses evaluating generated samples using reward functions and a 'simple animals prompt dataset', but it does not specify any training/test/validation splits for a dataset used in their experiments.
Hardware Specification Yes For all the following experiments, unless explicitly stated otherwise, a single run of DNO is performed on a single A800 GPU.
Software Dependencies No The paper mentions PyTorch (Paszke et al., 2019) as a tool used, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In this experiment, to solve the probability-regularized noise optimization problem as formulated in Equation (5), we employ the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.01. For optimization with regularization, we set the regularization coefficient γ to 1. To compute the minibatch stochastic gradient for the regularization term in Equation (5), we set the batch size b the number of random permutations drawn at each step to 100. Additionally, in Appendix B.4, the paper states: "We adopt the DDIM sampler with 50 steps and η = 1 for generation, and optimize all the injected noise in the generation process, the same as most experiments in this work. The classifier-free guidance is set to 5.0."