Improving Compositional Generation with Diffusion Models Using Lift Scores
Authors: Chenning Yu, Sicun Gao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, UC San Diego, La Jolla, USA. Correspondence to: Chenning Yu <EMAIL>, Sicun Gao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Rejection with Comp Lift; Algorithm 2 Optimized Rejection with Cached Comp Lift; Algorithm 3 Compose for multiple algebraic operations; Algorithm 4 Text-to-Image Generation with Comp Lift; Algorithm 5 Cache Values During Text-to-Image Generation; Algorithm 6 Comp Lift Using Cached Values |
| Open Source Code | Yes | Our code is available at rainorangelemon.github.io/complift. |
| Open Datasets | Yes | The CLEVR Position dataset (Johnson et al., 2017) is a dataset of rendered images with a variety of objects placed in different positions. |
| Dataset Splits | Yes | For each composition of conditions, our algorithm uses Composable Diffusion (Liu et al., 2022) to generate the initial 10 samples, then apply the Lift criterion to accept or reject samples. We test all methods with 5000 combinations of positions with various numbers of constraints. |
| Hardware Specification | No | The paper mentions 'GPU memory' but does not provide specific details on the hardware (e.g., GPU model, CPU, RAM) used for running experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch samplers' but does not provide specific version numbers for PyTorch or other libraries. |
| Experiment Setup | Yes | We use the diffusion process of 50 timesteps for both training and inference, which means the cached version of Comp Lift uses 50 trials. All methods use classifier-free guidance with a guidance scale of 7.5 (Ho & Salimans, 2022). For inference, we use 50 diffusion timesteps, which means the cached version of Comp Lift uses 50 trials. The vanilla version of Comp Lift uses 200 trials. We use CFG with a guidance scale of 7.5 for all methods. We fix τ = 250 for all experiments. |