reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AI-generated Image Quality Assessment in Visual Communication

Authors: Yu Tian, Yixuan Li, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Sam Kwong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To bridge this gap, we propose AIGI-VC, a quality assessment database for AI-Generated Images in Visual Communication, which studies the communicability of AIGIs in the advertising field from the perspectives of information clarity and emotional interaction. The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types. It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning. We conduct an empirical study of existing representative IQA methods and large multi-modal models on the AIGI-VC dataset, uncovering their strengths and weaknesses.
Researcher Affiliation	Academia	1City University of Hong Kong, Hong Kong SAR, China 2South China Normal University, Guangzhou, China 3Lingnan University, Hong Kong SAR, China
Pseudocode	No	The paper describes methods in paragraph text and uses figures to illustrate processes, but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/ytian73/AIGI-VC.
Open Datasets	Yes	In this work, we contribute a dataset called AIGI-VC, the first-of-its-kind database to study the communicability of AI-Generated Images in Visual Communication. The overview of the AIGI-VC dataset is shown in Fig. 1. The AIGI-VC dataset comprises a diverse collection of 2,500 images, encompassing 14 distinct ad topics and representing 8 different types of emotions. Code https://github.com/ytian73/AIGI-VC.
Dataset Splits	No	The paper mentions sampling 2,000 pairs from the AIGI-VC dataset for human preference collection and discusses 'Dall' and 'Dsub' subsets for evaluation based on preference probabilities. However, it does not provide explicit training, validation, or testing splits with percentages or sample counts for model training or evaluation as typically required for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies	Yes	We employ 14 objective metrics for performance comparisons, including one emotion classifier (WSCNet (She et al. 2020)), one vanilla quality assessment metrics designed for natural images (Hyper IQA (Su et al. 2020)), five CLIP-based metrics tailored for AIGIs (CLIPscore (Hessel et al. 2021), Aesthetic Score, HPS v2 (Wu et al. 2023), Image Reward (Xu et al. 2024), and Pick Score (Kirstain et al. 2024)), and seven LMMs that accept multiple images as input (m PLUG-Owl2 (Ye et al. 2023), LLa VA-v1.5-13B (Liu et al. 2024), Inter LM-XC.2vl (Dong et al. 2024), Bak LLava (Skunkworks AI 2024), Idefics2 (Laurenc on et al. 2024), Qwen-VL (Bai et al. 2023), and GPT-4o.
Experiment Setup	Yes	To ensure fairness, we use the default hyperparameters provided by the original models.