A First-order Generative Bilevel Optimization Framework for Diffusion Models
Authors: Quan Xiao, Hui Yuan, A F M Saif, Gaowen Liu, Ramana Rao Kompella, Mengdi Wang, Tianyi Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present the experimental results of the proposed bilevel-diffusion algorithms in two applications: reward fine-tuning and noise scheduling for diffusion models, and compare them with baseline hyperparameter optimization methods: grid search, random search, and Bayesian search (Snoek et al., 2012). Table 1 presents the average FID, CLIP score, and execution time for each method over prompts. |
| Researcher Affiliation | Collaboration | 1Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 2Department of Electrical and Computer Engineering, Cornell Tech, Cornell University, New York, NY 3Department of Electrical and Computer Engineering, Princeton University, NJ 4Cisco Research. |
| Pseudocode | Yes | Algorithm 1 A meta generative bilevel algorithm Algorithm 2 Bilevel approach with pre-trained model Algorithm 3 Score network training Algorithm 4 Backward sampling Algorithm 5 Guided Diffusion for Generative Optimization Algorithm 6 Bilevel Approach without Pre-trained Diffusion Model |
| Open Source Code | Yes | Experiments demonstrate that our method outperforms existing finetuning and hyperparameter search baselines. Our code has been released at https://github. com/afmsaif/bilevel_diffusion. |
| Open Datasets | Yes | We evaluated our bilevel noise scheduling method, detailed in Algorithm 6, paired with DDIM backward sampling for the image generation on the MNIST dataset. For this experiment, we use the Stable Diffusion V1.5 model as our pre-trained model and employ a Res Net-18 architecture (trained on the Image Net dataset) as the synthetic (lower-level) reward model |
| Dataset Splits | No | The paper mentions using the MNIST dataset and generated images for evaluation, but it does not specify any explicit training/test/validation splits (e.g., percentages or sample counts) for these datasets within the text. It implies the use of standard datasets but does not detail their partitioning. |
| Hardware Specification | Yes | All experiments were conducted on two servers: one with four NVIDIA A6000 GPUs, and 256 GB of RAM; one with an Intel i9-9960X CPU and two NVIDIA A5000 GPUs. |
| Software Dependencies | Yes | Although it is possible to obtain the gradient of θLSQ( um θ,q) using Py Torch s auto-differentiation, it requires differentiating through the backward sampling trajectory. M. pytorch-fid: FID Score for Py Torch. https://github.com/mseitzer/pytorch-fid, August 2020. Version 0.3.0. |
| Experiment Setup | Yes | We use a batch size of 3 for the fine-tuning step and set optimization step 7 and repeat the optimization for 4 times. We use a batch size of 128 and choose the number of inner loop Sz for θz updates as 1. Empirically, we found that, at the beginning of the training process (i.e. when k = 0), the number of inner loop S0 y for updating θy should be larger to obtain a relatively reasonable U-Net, but later on, we do not need large inner loop, i.e. we set Sk y = 10 for k 1. We formalize this stage as initial epoch, where we traverse every batch and set S0 y = 20. We choose the ZO perturbation amount as ν = 0.01. |