Improving Compositional Generation with Diffusion Models Using Lift Scores

Authors: Chenning Yu, Sicun Gao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, UC San Diego, La Jolla, USA. Correspondence to: Chenning Yu <EMAIL>, Sicun Gao <EMAIL>.
Pseudocode Yes Algorithm 1 Rejection with Comp Lift; Algorithm 2 Optimized Rejection with Cached Comp Lift; Algorithm 3 Compose for multiple algebraic operations; Algorithm 4 Text-to-Image Generation with Comp Lift; Algorithm 5 Cache Values During Text-to-Image Generation; Algorithm 6 Comp Lift Using Cached Values
Open Source Code Yes Our code is available at rainorangelemon.github.io/complift.
Open Datasets Yes The CLEVR Position dataset (Johnson et al., 2017) is a dataset of rendered images with a variety of objects placed in different positions.
Dataset Splits Yes For each composition of conditions, our algorithm uses Composable Diffusion (Liu et al., 2022) to generate the initial 10 samples, then apply the Lift criterion to accept or reject samples. We test all methods with 5000 combinations of positions with various numbers of constraints.
Hardware Specification No The paper mentions 'GPU memory' but does not provide specific details on the hardware (e.g., GPU model, CPU, RAM) used for running experiments.
Software Dependencies No The paper mentions 'PyTorch samplers' but does not provide specific version numbers for PyTorch or other libraries.
Experiment Setup Yes We use the diffusion process of 50 timesteps for both training and inference, which means the cached version of Comp Lift uses 50 trials. All methods use classifier-free guidance with a guidance scale of 7.5 (Ho & Salimans, 2022). For inference, we use 50 diffusion timesteps, which means the cached version of Comp Lift uses 50 trials. The vanilla version of Comp Lift uses 200 trials. We use CFG with a guidance scale of 7.5 for all methods. We fix τ = 250 for all experiments.