reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Compositional Generation with Diffusion Models Using Lift Scores

Authors: Chenning Yu, Sicun Gao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, UC San Diego, La Jolla, USA. Correspondence to: Chenning Yu <EMAIL>, Sicun Gao <EMAIL>.
Pseudocode	Yes	Algorithm 1 Rejection with Comp Lift; Algorithm 2 Optimized Rejection with Cached Comp Lift; Algorithm 3 Compose for multiple algebraic operations; Algorithm 4 Text-to-Image Generation with Comp Lift; Algorithm 5 Cache Values During Text-to-Image Generation; Algorithm 6 Comp Lift Using Cached Values
Open Source Code	Yes	Our code is available at rainorangelemon.github.io/complift.
Open Datasets	Yes	The CLEVR Position dataset (Johnson et al., 2017) is a dataset of rendered images with a variety of objects placed in different positions.
Dataset Splits	Yes	For each composition of conditions, our algorithm uses Composable Diffusion (Liu et al., 2022) to generate the initial 10 samples, then apply the Lift criterion to accept or reject samples. We test all methods with 5000 combinations of positions with various numbers of constraints.
Hardware Specification	No	The paper mentions 'GPU memory' but does not provide specific details on the hardware (e.g., GPU model, CPU, RAM) used for running experiments.
Software Dependencies	No	The paper mentions 'PyTorch samplers' but does not provide specific version numbers for PyTorch or other libraries.
Experiment Setup	Yes	We use the diffusion process of 50 timesteps for both training and inference, which means the cached version of Comp Lift uses 50 trials. All methods use classifier-free guidance with a guidance scale of 7.5 (Ho & Salimans, 2022). For inference, we use 50 diffusion timesteps, which means the cached version of Comp Lift uses 50 trials. The vanilla version of Comp Lift uses 200 trials. We use CFG with a guidance scale of 7.5 for all methods. We fix τ = 250 for all experiments.