IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Authors: Xinchen Zhang, Ling Yang, Guohao Li, YaQi Cai, xie jiake, Yong Tang, Yujiu Yang, Mengdi Wang, Bin CUI

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate our significant superiority over previous methods, particularly in multi-category object composition and complex semantic alignment. Iter Comp opens new research avenues in reward feedback learning for diffusion models and compositional generation.
Researcher Affiliation Collaboration 1Tsinghua University 2Peking University 3Lib AI Lab 4USTC 5University of Oxford 6Princeton University
Pseudocode Yes Algorithm 1 Iterative Composition-aware Feedback Learning
Open Source Code Yes https://github.com/YangLing0818/IterComp
Open Datasets Yes We randomly select 500 prompts from each of the following categories: color, shape, and texture in the T2I-Comp Bench (Huang et al., 2023), resulting in a total of 1,500 prompts. During the iterative feedback learning process, we randomly select 10,000 prompts from Diffusion DB (Wang et al., 2022) and use SDXL (Betker et al., 2023) as the base diffusion model
Dataset Splits No The paper describes selecting prompts and collecting image-rank pairs to form a dataset but does not specify explicit training, validation, or test splits for this dataset or for the 10,000 prompts selected from Diffusion DB used for finetuning the base model.
Hardware Specification Yes All experiments are conducted on 4 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions BLIP as a feature extractor but does not provide any specific version numbers for BLIP or other software libraries/frameworks (e.g., PyTorch, Python).
Experiment Setup Yes For training the three reward models, we finetune BLIP and the learnable MLP with a learning rate of 1e 5 and a batch size of 64. ... finetuning it with a learning rate of 1e 5 and a batch size of 4. We set T = 40, [T1, T2] = [1, 10], ϕ = ReLU, and λ = 1e 3.