Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization
Authors: Taeyoung Yun, Kiyoung Om, Jaewoo Lee, Sujin Yun, Jinkyoo Park
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method outperforms state-of-the-art baselines across synthetic and realworld tasks. Our code is publicly available here. ...We conduct extensive experiments on four synthetic and three real-world high-dimensional black-box optimization tasks. We demonstrate that our method achieves superior performance on a variety of tasks compared to state-of-the-art baselines, including BO methods, generative modelbased methods, and evolutionary algorithms. |
| Researcher Affiliation | Academia | 1Korea Advanced Institute of Science and Technology (KAIST). Correspondence to: Taeyoung Yun <EMAIL>, Kiyoung Om <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Di BO 1: Input: Initial dataset D0; Max rounds R; Batch size B; Buffer size L; Diffusion model pθ, pψ; Proxy fϕ1, fϕK 2: for r = 0, . . . , R 1 do 3: Phase 1. Training Models 4: Compute weights w(y, Dr) with Equation (9) 5: Train fϕ1, fϕK with Equation (10) 6: Train pθ with Equation (11) 7: 8: Phase 2. Sampling Candidates 9: Initialize pψ pθ 10: Train pψ with Equation (14) using x0:T from pψ or from the dataset Dr 11: Sample {xi}M i=1 pψ(x) 12: Update {xi}M i=1 into {x i }M i=1 with Equation (15) 13: Filter top-B samples {xb}B b=1 among {x i }M i=1 14: 15: Evaluation and Moving Dataset 16: Evaluate yb = f(xb), b = 1, , B 17: Update Dr+1 Dr {(xb, yb)}B b=1 18: if |Dr+1| > L then 19: Remove bottom-(|Dr+1| L) samples from Dr+1 20: end if 21: end for |
| Open Source Code | Yes | Our code is publicly available here. |
| Open Datasets | Yes | We conduct experiments on four synthetic and three real-world high-dimensional black-box optimization tasks, including Half Cheetah-102D from Mu Jo Co Locomotion, Rover Planning-100D, and DNA-180D from Lasso Bench. The description of each task is available in Appendix A. ...Mu Jo Co locomotion task (Todorov et al., 2012) is a popular benchmark in Reinforcement Learning (RL). ...Rover Trajectory Optimization is a task determining the trajectory of a rover in a 2D environment suggested by Wang et al. (2018). ...Lasso Bench (ˇSehi c et al., 2022) is a challenge focused on optimizing the hyperparameters of Weighted LASSO (Least Absolute Shrinkage and Selection Operator) regression. |
| Dataset Splits | Yes | All experiments are conducted with initial dataset size |D0| = 200, batch size B = 100, and 10, 000 as the maximum evaluation limit. ...Each experiment starts with |D0| = 100 initial samples, a batch size of B = 50, and a maximum evaluation limit of 2, 000. |
| Hardware Specification | Yes | All the training is done with a Single NVIDIA RTX 3090 GPU. ...All training is done with a single NVIDIA RTX 3090 GPU and Intel Xeon Platinum CPU @ 2.90GHZ. |
| Software Dependencies | No | The paper mentions software like "torchdiffeq (Chen, 2018)" and "pycma (Hansen et al., 2019)" and general frameworks like "PyTorch" and "Adam (Kingma, 2015) optimizer" but does not specify exact version numbers for these software libraries or frameworks. For example, it does not state "torchdiffeq 0.2.1" or "PyTorch 1.13.1". |
| Experiment Setup | Yes | We train five ensembles of proxies. To implement the proxy function, we use MLP with three hidden layers, each consisting of 256 (512 for 400 dim tasks) hidden units and GELU (Hendrycks & Gimpel, 2016) activations. We train a proxy model using Adam (Kingma, 2015) optimizer for 50 (100 for 400 dim tasks) epochs per round, with a learning rate 1 10 3. We set the batch size to 256. The hyperparameters related to the proxy are listed in Table 1. ...We utilize the temporal Residual MLP architecture from Venkatraman et al. (2024) as the backbone of our diffusion model. The architecture consists of three hidden layers, each containing 512 hidden units. We implement GELU activations alongside layer normalization (Ba, 2016). During training, we use the Adam optimizer for 50 epochs (100 for 400 dim tasks) per round with a learning rate of 1 10 3. We set the batch size to 256. We employ linear variance scheduling and noise prediction networks with 30 diffusion steps for all tasks. The hyperparameters related to the diffusion model are summarized in Table 2. ...For the upper confidence bound (UCB), we fixed γ = 1.0 that controls the exploration-exploitation. For the target posterior distribution, the inverse temperature parameter β controls the trade-off between the influence of exp(rϕ(x)) and pθ(x). When selecting querying candidates, we sample M = B 102 candidates from pψ(x), perform a local search for J steps, and retain B candidates for batched querying. After querying and adding candidates, we maintain our training dataset to contain L high-scoring samples. We present the detailed hyperparameter settings in Table 4. We also conduct several ablation studies to explore the effect of each hyperparameter on the performance. |