IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Authors: Xinchen Zhang, Ling Yang, Guohao Li, YaQi Cai, xie jiake, Yong Tang, Yujiu Yang, Mengdi Wang, Bin CUI
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate our significant superiority over previous methods, particularly in multi-category object composition and complex semantic alignment. Iter Comp opens new research avenues in reward feedback learning for diffusion models and compositional generation. |
| Researcher Affiliation | Collaboration | 1Tsinghua University 2Peking University 3Lib AI Lab 4USTC 5University of Oxford 6Princeton University |
| Pseudocode | Yes | Algorithm 1 Iterative Composition-aware Feedback Learning |
| Open Source Code | Yes | https://github.com/YangLing0818/IterComp |
| Open Datasets | Yes | We randomly select 500 prompts from each of the following categories: color, shape, and texture in the T2I-Comp Bench (Huang et al., 2023), resulting in a total of 1,500 prompts. During the iterative feedback learning process, we randomly select 10,000 prompts from Diffusion DB (Wang et al., 2022) and use SDXL (Betker et al., 2023) as the base diffusion model |
| Dataset Splits | No | The paper describes selecting prompts and collecting image-rank pairs to form a dataset but does not specify explicit training, validation, or test splits for this dataset or for the 10,000 prompts selected from Diffusion DB used for finetuning the base model. |
| Hardware Specification | Yes | All experiments are conducted on 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions BLIP as a feature extractor but does not provide any specific version numbers for BLIP or other software libraries/frameworks (e.g., PyTorch, Python). |
| Experiment Setup | Yes | For training the three reward models, we finetune BLIP and the learnable MLP with a learning rate of 1e 5 and a batch size of 64. ... finetuning it with a learning rate of 1e 5 and a batch size of 4. We set T = 40, [T1, T2] = [1, 10], ϕ = ReLU, and λ = 1e 3. |