Are High-Quality AI-Generated Images More Difficult for Models to Detect?

Authors: Yao Xiao, Binbin Yang, Weiyan Chen, Jiahao Chen, Zijie Cao, Ziyi Dong, Xiangyang Ji, Liang Lin, Wei Ke, Pengxu Wei

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental However, our systematic study on cutting-edge text-to-image generators reveals a counterintuitive finding: AIGIs with higher quality scores, as assessed by human preference models, tend to be more easily detected by existing models. To investigate this, we examine how the text prompts for generation and image characteristics influence both quality scores and detector accuracy. Furthermore, through clustering and regression analyses, we verify that image characteristics like saturation, contrast, and texture richness collectively impact both image quality and detector accuracy. Finally, we demonstrate that the performance of off-the-shelf detectors can be enhanced across diverse generators and datasets by selecting input patches based on the predicted scores of our regression models, thus substantiating the broader applicability of our findings.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2School of Software Engineering, Xi an Jiaotong University, Xi an, China 3Department of Automation, Tsinghua University, Beijing, China 4Peng Cheng Laboratory, Shenzhen, China.
Pseudocode No The paper describes methodologies and analyses, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and data are available at Git Hub.
Open Datasets Yes To this end, we construct a high-quality and diverse dataset by 1) collecting real images from four source datasets; 2) obtaining 4,000 captions spanning a wide range of complexity from these real images; and 3) generating fake images using these captions as prompts based on text-to-image generators, e.g., Stable Diffusion 2.1 (SD 2.1) (Rombach et al., 2022), Stable Diffusion XL 1.0 (SDXL 1.0) (Podell et al., 2024), Stable Diffusion 3 (SD 3) (Esser et al., 2024), and Pix Art-α (Chen et al., 2024c). ... we collect real images from four existing datasets: COCO (Lin et al., 2014), CC3M (Sharma et al., 2018), LAION-Aesthetic (Schuhmann et al., 2022), and SA-1B (Kirillov et al., 2023).
Dataset Splits No The paper describes dataset collection and evaluation of detectors on various generators and datasets. While it mentions training the SSP model on Gen Image, it does not provide specific training/validation/test splits for the authors' own collected dataset or for the regression models used in their analysis.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions various models and tools used (e.g., BLIP-2, DINOv2, K-Means algorithm, Canny edge detector), but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup No The paper describes its evaluation setup and mentions using official pre-trained weights and configurations for existing AIGI detectors. It also discusses linear regression analyses. However, it does not provide specific hyperparameters (e.g., learning rate, batch size, epochs) or detailed training configurations for any new models or analyses presented in the main text (e.g., the regression models).