reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Measuring And Improving Engagement of Text-to-Image Generation Models

Authors: Varun Khurana, Yaman Singla, Jayakumar Subramanian, Changyou Chen, Rajiv Ratn Shah, Zhiqiang Xu, Balaji Krishnamurthy

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We collect Engaging Image Net, the first large-scale dataset of images, along with associated user engagement metrics. Further, we find that existing image evaluation metrics like aesthetics, CLIPScore, Pick Score, Image Reward, etc. are unable to capture viewer engagement. To address the lack of reliable metrics for assessing image utility, we use the Engaging Image Net dataset to train Engage Net, an engagement-aware Vision Language Model (VLM) that predicts viewer engagement of images... We then explore methods to enhance the engagement of text-to-image models... We present the results of these experiments in Section 4 and report the efficacy of each method in generating more engaging images.
Researcher Affiliation	Collaboration	Adobe Media and Data Science Research, SUNY at Buffalo, IIIT-Delhi
Pseudocode	No	The paper describes methods in paragraph text and equations. There are 'Listings' in the appendix showing verbalization patterns, but these are not structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have released our code and dataset on behavior-in-the-wild.github.io/image-engagement.
Open Datasets	Yes	We have released our code and dataset on behavior-in-the-wild.github.io/image-engagement.
Dataset Splits	Yes	Finally, we end up with a dataset of 957,809 samples, which we split into training and testing sets. We randomly sampled nearly 2000 samples from each bucket for testing, with the remaining samples used for training.
Hardware Specification	No	The paper discusses various models like LLaVA-1.5, Stable Diffusion, GPT-3.5, and GPT-4V, but does not specify the hardware (e.g., GPU models, CPU types, or TPU versions) used for running its own experiments.
Software Dependencies	No	The paper mentions software like LLaVA, FAISS, and Spacy, but does not provide specific version numbers for these or any other key software components used in their experiments.
Experiment Setup	Yes	The model was finetuned for 50 epochs, following the procedure outlined by von Platen et al. (2023). ...the final loss function for Engage Net is given by: LEOIG = LCE + λLMSE (1) where λ is a hyperparameter that controls the weight of the auxiliary loss. We set λ = 0.1 in our experiments.