CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation

Authors: Minghao Fu, Guo-Hua Wang, Liangfu Cao, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that CHATS surpasses traditional preference alignment methods, setting new state-of-the-art across various standard benchmarks. Empirical evaluations on two mainstream text-to-image generation frameworks, diffusion (Podell et al., 2024) and flow matching (Liu et al., 2023; Lipman et al., 2023), underscore the superiority of our method. We utilize publicly available benchmark prompts from Gen Eval (Ghosh et al., 2023), DPGBench (Hu et al., 2024), and HPS v2 (Wu et al., 2023). We employ multiple evaluation metrics, including HPS v2 (Wu et al., 2023), Image Reward (Xu et al., 2024), and Pick Score (Kirstain et al., 2023b).
Researcher Affiliation Collaboration Minghao Fu 1 2 3 Guo-Hua Wang 3 Liangfu Cao 3 Qing-Guo Chen 3 Zhao Xu 3 Weihua Luo 3 Kaifu Zhang 3 1National Key Laboratory for Novel Software Technology, Nanjing University 2School of Artificial Intelligence, Nanjing University 3Alibaba Group. Correspondence to: Minghao Fu <EMAIL>, Guo-Hua Wang <EMAIL>.
Pseudocode No The paper describes mathematical derivations and algorithmic steps in prose, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code is publicly available at github.com/AIDC-AI/CHATS.
Open Datasets Yes We conduct experiments primarily on two preference optimization datasets, Pick-a-Pic v2 (Pa P v2) (Kirstain et al., 2023a) and Open Image Preferences (OIP) (Data is Better Together, 2024). Open Image Preferences. https://huggingface.co/datasets/data-is-better-together/open-image-preferences-v1-binarized, 2024.
Dataset Splits No The paper mentions using 'Pick-a-Pic v2' and 'Open Image Preferences' datasets for finetuning and specifies benchmark prompts for evaluation (Gen Eval, DPG-Bench, HPS v2), but it does not provide specific training/test/validation splits for its own finetuning process or for the preference datasets themselves.
Hardware Specification Yes Throughput with 50 sampling steps, measured on NVIDIA A100 GPU with BF16 inference.
Software Dependencies No The paper mentions using Adafactor (Shazeer & Stern, 2018) and Adam W (Loshchilov & Hutter, 2019) as optimizers, but does not provide specific versions for underlying software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Training is conducted with an effective batch size of 512, maintaining an image resolution of 1024. The default learning rate is set to 1 10 8, and a learning rate scaling strategy based on batch size increases is utilized to accelerate the finetuning. T (cf. Eq. 13 and Eq. 14) is fixed as 1000. During sampling, by default we keep s and α as 5 and 0.5, respectively.