Measuring And Improving Engagement of Text-to-Image Generation Models

Authors: Varun Khurana, Yaman Singla, Jayakumar Subramanian, Changyou Chen, Rajiv Ratn Shah, Zhiqiang Xu, Balaji Krishnamurthy

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We collect Engaging Image Net, the first large-scale dataset of images, along with associated user engagement metrics. Further, we find that existing image evaluation metrics like aesthetics, CLIPScore, Pick Score, Image Reward, etc. are unable to capture viewer engagement. To address the lack of reliable metrics for assessing image utility, we use the Engaging Image Net dataset to train Engage Net, an engagement-aware Vision Language Model (VLM) that predicts viewer engagement of images... We then explore methods to enhance the engagement of text-to-image models... We present the results of these experiments in Section 4 and report the efficacy of each method in generating more engaging images.
Researcher Affiliation Collaboration Adobe Media and Data Science Research, SUNY at Buffalo, IIIT-Delhi
Pseudocode No The paper describes methods in paragraph text and equations. There are 'Listings' in the appendix showing verbalization patterns, but these are not structured pseudocode or algorithm blocks.
Open Source Code Yes We have released our code and dataset on behavior-in-the-wild.github.io/image-engagement.
Open Datasets Yes We have released our code and dataset on behavior-in-the-wild.github.io/image-engagement.
Dataset Splits Yes Finally, we end up with a dataset of 957,809 samples, which we split into training and testing sets. We randomly sampled nearly 2000 samples from each bucket for testing, with the remaining samples used for training.
Hardware Specification No The paper discusses various models like LLaVA-1.5, Stable Diffusion, GPT-3.5, and GPT-4V, but does not specify the hardware (e.g., GPU models, CPU types, or TPU versions) used for running its own experiments.
Software Dependencies No The paper mentions software like LLaVA, FAISS, and Spacy, but does not provide specific version numbers for these or any other key software components used in their experiments.
Experiment Setup Yes The model was finetuned for 50 epochs, following the procedure outlined by von Platen et al. (2023). ...the final loss function for Engage Net is given by: LEOIG = LCE + λLMSE (1) where λ is a hyperparameter that controls the weight of the auxiliary loss. We set λ = 0.1 in our experiments.