Measuring And Improving Engagement of Text-to-Image Generation Models
Authors: Varun Khurana, Yaman Singla, Jayakumar Subramanian, Changyou Chen, Rajiv Ratn Shah, Zhiqiang Xu, Balaji Krishnamurthy
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We collect Engaging Image Net, the first large-scale dataset of images, along with associated user engagement metrics. Further, we find that existing image evaluation metrics like aesthetics, CLIPScore, Pick Score, Image Reward, etc. are unable to capture viewer engagement. To address the lack of reliable metrics for assessing image utility, we use the Engaging Image Net dataset to train Engage Net, an engagement-aware Vision Language Model (VLM) that predicts viewer engagement of images... We then explore methods to enhance the engagement of text-to-image models... We present the results of these experiments in Section 4 and report the efficacy of each method in generating more engaging images. |
| Researcher Affiliation | Collaboration | Adobe Media and Data Science Research, SUNY at Buffalo, IIIT-Delhi |
| Pseudocode | No | The paper describes methods in paragraph text and equations. There are 'Listings' in the appendix showing verbalization patterns, but these are not structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have released our code and dataset on behavior-in-the-wild.github.io/image-engagement. |
| Open Datasets | Yes | We have released our code and dataset on behavior-in-the-wild.github.io/image-engagement. |
| Dataset Splits | Yes | Finally, we end up with a dataset of 957,809 samples, which we split into training and testing sets. We randomly sampled nearly 2000 samples from each bucket for testing, with the remaining samples used for training. |
| Hardware Specification | No | The paper discusses various models like LLaVA-1.5, Stable Diffusion, GPT-3.5, and GPT-4V, but does not specify the hardware (e.g., GPU models, CPU types, or TPU versions) used for running its own experiments. |
| Software Dependencies | No | The paper mentions software like LLaVA, FAISS, and Spacy, but does not provide specific version numbers for these or any other key software components used in their experiments. |
| Experiment Setup | Yes | The model was finetuned for 50 epochs, following the procedure outlined by von Platen et al. (2023). ...the final loss function for Engage Net is given by: LEOIG = LCE + λLMSE (1) where λ is a hyperparameter that controls the weight of the auxiliary loss. We set λ = 0.1 in our experiments. |