reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adding Conditional Control to Diffusion Models with Reinforcement Learning

Authors: Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Sunyuan Kung, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare CTRL with five baselines: (1) Reconstruction Guidance... Experimentally, we validate the superiority of CTRL over baselines in both single-task and multi-task conditional image generation, such as generating highly aesthetic yet compressible images, where existing methods often struggle. Table 1 summarizes the main features of the proposed algorithm compared to existing methods. ... Section 6 is titled "EXPERIMENTS" and includes detailed results, evaluation tables (Table 1b, 1c, 2b), figures (Figure 1, 2) showing generated images, histograms, and training curves, and comparisons to multiple baselines using metrics like Accuracy, Macro F1 score, BRISQUE, and CLIP Score.
Researcher Affiliation	Collaboration	Yulai Zhao Princeton University EMAIL Masatoshi Uehara Genentech EMAIL Gabriele Scalia Genentech EMAIL Sunyuan Kung Princeton University EMAIL Tommaso Biancalani Genentech EMAIL Sergey Levine University of California, Berkeley EMAIL Ehsan Hajiramezanali Genentech EMAIL
Pseudocode	Yes	Algorithm 1 Conditioning pre-Trained diffusion models with Reinforcement Learning (CTRL) Algorithm 2 Direct back-propagation for conditioning
Open Source Code	Yes	The code is available at https://github.com/zhaoyl18/CTRL. ... We submit the code for our image experiments as supplementary materials.
Open Datasets	Yes	For image experiments (Section 6.1, Section 6.2), we use Stable Diffusion v1.5 (Rombach et al., 2022) as the pre-trained model ppre(x\|c)... For the additional control y, we validate compressibilities and aesthetic scores. ... The offline dataset is constructed by labeling a subset of 10k images of the AVA dataset (Murray et al., 2012)... To train the classifier, we use the full AVA dataset (Murray et al., 2012) which includes more than 250k human evaluations.
Dataset Splits	Yes	Note that in training both classifiers, we split the dataset with 80% for training and 20% for validation.
Hardware Specification	Yes	We use 4 A100 GPUs for all the image tasks.
Software Dependencies	No	The paper mentions specific software components like "Adam W optimizer", "DDIM sampler", "Lo RA modules", and "Gradient checkpointing", but does not provide specific version numbers for any of these or for general programming environments (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	We use the Adam W optimizer (Loshchilov and Hutter, 2019) with β1 = 0.9, β2 = 0.999 and weight decay of 0.1. ... Table 2: Training hyperparameters. ... Classifier-free guidance weight on prompts (i.e., c) 7.5; γ (i.e., strength of the additional guidance on y) 10; DDIM steps 50; Truncated back-propagation step K Uniform(0, 50); Learning rate for Lo RA modules 1e-3; Learning rate for the linear embeddings 1e-2; Batch size (per gradient update) 256; Number of gradient updates per epoch 2; Epochs 15.