Adding Conditional Control to Diffusion Models with Reinforcement Learning
Authors: Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Sunyuan Kung, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare CTRL with five baselines: (1) Reconstruction Guidance... Experimentally, we validate the superiority of CTRL over baselines in both single-task and multi-task conditional image generation, such as generating highly aesthetic yet compressible images, where existing methods often struggle. Table 1 summarizes the main features of the proposed algorithm compared to existing methods. ... Section 6 is titled "EXPERIMENTS" and includes detailed results, evaluation tables (Table 1b, 1c, 2b), figures (Figure 1, 2) showing generated images, histograms, and training curves, and comparisons to multiple baselines using metrics like Accuracy, Macro F1 score, BRISQUE, and CLIP Score. |
| Researcher Affiliation | Collaboration | Yulai Zhao Princeton University EMAIL Masatoshi Uehara Genentech EMAIL Gabriele Scalia Genentech EMAIL Sunyuan Kung Princeton University EMAIL Tommaso Biancalani Genentech EMAIL Sergey Levine University of California, Berkeley EMAIL Ehsan Hajiramezanali Genentech EMAIL |
| Pseudocode | Yes | Algorithm 1 Conditioning pre-Trained diffusion models with Reinforcement Learning (CTRL) Algorithm 2 Direct back-propagation for conditioning |
| Open Source Code | Yes | The code is available at https://github.com/zhaoyl18/CTRL. ... We submit the code for our image experiments as supplementary materials. |
| Open Datasets | Yes | For image experiments (Section 6.1, Section 6.2), we use Stable Diffusion v1.5 (Rombach et al., 2022) as the pre-trained model ppre(x|c)... For the additional control y, we validate compressibilities and aesthetic scores. ... The offline dataset is constructed by labeling a subset of 10k images of the AVA dataset (Murray et al., 2012)... To train the classifier, we use the full AVA dataset (Murray et al., 2012) which includes more than 250k human evaluations. |
| Dataset Splits | Yes | Note that in training both classifiers, we split the dataset with 80% for training and 20% for validation. |
| Hardware Specification | Yes | We use 4 A100 GPUs for all the image tasks. |
| Software Dependencies | No | The paper mentions specific software components like "Adam W optimizer", "DDIM sampler", "Lo RA modules", and "Gradient checkpointing", but does not provide specific version numbers for any of these or for general programming environments (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | We use the Adam W optimizer (Loshchilov and Hutter, 2019) with β1 = 0.9, β2 = 0.999 and weight decay of 0.1. ... Table 2: Training hyperparameters. ... Classifier-free guidance weight on prompts (i.e., c) 7.5; γ (i.e., strength of the additional guidance on y) 10; DDIM steps 50; Truncated back-propagation step K Uniform(0, 50); Learning rate for Lo RA modules 1e-3; Learning rate for the linear embeddings 1e-2; Batch size (per gradient update) 256; Number of gradient updates per epoch 2; Epochs 15. |