Direct Distributional Optimization for Provable Alignment of Diffusion Models
Authors: Ryotaro Kawata, Kazusato Oko, Atsushi Nitanda, Taiji Suzuki
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate its performance on synthetic and image datasets using the DPO objective. |
| Researcher Affiliation | Academia | 1Department of Mathematical Informatics, University of Tokyo, Japan 2Center for Advanced Intelligence Project, RIKEN, Japan 3Department of ECCS, UC Berkeley 4CFAR and IHPC, Agency for Science, Technology and Research (A STAR), Singapore 5College of Computing and Data Science, Nanyang Technological University, Singapore |
| Pseudocode | Yes | Algorithm 4.1 Dual Averaging (DA) Input: s: pre-trained score, f1: initialized neural networks Output: f K: a trained potential. Set q(0) = pref and q(1) exp( f1)pref for k = 1, ..., K 1 do Obtain g(k) via the DA algorithm with Option 1 (Eq. (3)) or Option 2 (Eq. (4)) using the recurrence formula (47), where ˆq(k+1) exp( g(k))pref is the ideal update. Train a neural network fk+1 to approximate g(k), and set q(k+1) exp ( fk+1)pref. end for |
| Open Source Code | No | The paper mentions using 'Hugging Face (2022)' and 'Diffusers library von Platen et al. (2022)' which are third-party tools. There is no explicit statement or link provided for the authors' own implementation code. |
| Open Datasets | Yes | We used 10000 images of Head CT in Medical MNIST (Lozano, 2017) and augmented it by rotating images up to 40000. |
| Dataset Splits | Yes | The training data was 95% of augmented 40000 images and the validation data was 5% of them. |
| Hardware Specification | Yes | We used 8 NVIDIA V100 GPUs with 32GB memory. |
| Software Dependencies | No | The paper mentions 'Unet2Dmodel in Diffusers library von Platen et al. (2022)' but does not provide specific version numbers for software components like Python, PyTorch, or the Diffusers library itself. |
| Experiment Setup | Yes | We set the hyperparameter β in [0.04, 0.2]... The learning rate of pre-train was 0.0005, the batch size was 100, the number of epochs was 1000... The regularization terms β and γ were 0.04 and 0.1... the learning rate = 0.0005 and 0.0001 for w/ reg and w/o reg respectively, and batch size = 5000... β and γ were 0.05 and 1. The output of the functional derivative was clipped in 20 to stabilize the training step... fk was trained by 1024 images from pooled 6400 images, the learning rate was 0.0001, the batch size was 64, and the number of epochs was 5. |