AI-generated Image Quality Assessment in Visual Communication
Authors: Yu Tian, Yixuan Li, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Sam Kwong
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To bridge this gap, we propose AIGI-VC, a quality assessment database for AI-Generated Images in Visual Communication, which studies the communicability of AIGIs in the advertising field from the perspectives of information clarity and emotional interaction. The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types. It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning. We conduct an empirical study of existing representative IQA methods and large multi-modal models on the AIGI-VC dataset, uncovering their strengths and weaknesses. |
| Researcher Affiliation | Academia | 1City University of Hong Kong, Hong Kong SAR, China 2South China Normal University, Guangzhou, China 3Lingnan University, Hong Kong SAR, China |
| Pseudocode | No | The paper describes methods in paragraph text and uses figures to illustrate processes, but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/ytian73/AIGI-VC. |
| Open Datasets | Yes | In this work, we contribute a dataset called AIGI-VC, the first-of-its-kind database to study the communicability of AI-Generated Images in Visual Communication. The overview of the AIGI-VC dataset is shown in Fig. 1. The AIGI-VC dataset comprises a diverse collection of 2,500 images, encompassing 14 distinct ad topics and representing 8 different types of emotions. Code https://github.com/ytian73/AIGI-VC. |
| Dataset Splits | No | The paper mentions sampling 2,000 pairs from the AIGI-VC dataset for human preference collection and discusses 'Dall' and 'Dsub' subsets for evaluation based on preference probabilities. However, it does not provide explicit training, validation, or testing splits with percentages or sample counts for model training or evaluation as typically required for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We employ 14 objective metrics for performance comparisons, including one emotion classifier (WSCNet (She et al. 2020)), one vanilla quality assessment metrics designed for natural images (Hyper IQA (Su et al. 2020)), five CLIP-based metrics tailored for AIGIs (CLIPscore (Hessel et al. 2021), Aesthetic Score, HPS v2 (Wu et al. 2023), Image Reward (Xu et al. 2024), and Pick Score (Kirstain et al. 2024)), and seven LMMs that accept multiple images as input (m PLUG-Owl2 (Ye et al. 2023), LLa VA-v1.5-13B (Liu et al. 2024), Inter LM-XC.2vl (Dong et al. 2024), Bak LLava (Skunkworks AI 2024), Idefics2 (Laurenc on et al. 2024), Qwen-VL (Bai et al. 2023), and GPT-4o. |
| Experiment Setup | Yes | To ensure fairness, we use the default hyperparameters provided by the original models. |