reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Sanity Check for AI-generated Image Detection

Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of our model, we conduct extensive experiments on two popular benchmarks, including AIGCDetect Benchmark (Wang et al., 2020) and Gen Image (Zhu et al., 2024), for AI-generated image detection. On AIGCDetect Benchmark and Gen Image benchmarks, AIDE surpasses state-of-the-art (SOTA) methods by +3.5% and +4.6% in accuracy scores, respectively. Moreover, AIDE also achieves competitive performance on our Chameleon benchmark.
Researcher Affiliation	Collaboration	Shilin Yan1 , Ouxiang Li2 , Jiayin Cai1 , Yanbin Hao2, Xiaolong Jiang1, Yao Hu1, Weidi Xie3 1Xiaohongshu Inc. 2University of Science and Technology of China 3Shanghai Jiao Tong University
Pseudocode	No	The paper describes the methodology in Section 4 and provides a diagram in Figure 2, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	Project Page: https://shilinyan99.github.io/AIDE/ The paper provides a project page link but does not explicitly state that the source code for the methodology is released on this page or elsewhere. A project page often provides an overview or demonstration rather than direct access to code.
Open Datasets	Yes	In this paper, we conduct a sanity check on whether the task of AI-generated image detection has been solved. To start with, we present Chameleon dataset, consisting of AI-generated images that are genuinely challenging for human perception. To evaluate the effectiveness of our model, we conduct extensive experiments on two popular benchmarks, including AIGCDetect Benchmark (Wang et al., 2020) and Gen Image (Zhu et al., 2024), for AI-generated image detection.
Dataset Splits	Yes	Train-Test Setting-I. In the literature, existing works on detecting AI-generated images (Wang et al., 2020; Frank et al., 2020; Ojha et al., 2023; Wang et al., 2023a; Zhong et al., 2023) have exclusively considered the scenario of training on images from single generative model, for example, Pro GAN (Karras et al., 2018), or Stable Diffusion (Sta, 2022), and then evaluated on images from various generative models. Train-Test Setting-II. Herein, we propose an alternative problem formulation, where the models are allowed to train on images generated from a wide spectrum of generative models, and then tested on images that are genuinely challenging for human perception.
Hardware Specification	Yes	The model is trained on 8 NVIDIA A100 GPUs for only 5 epochs.
Software Dependencies	No	We use Adam W optimizer with the learning rate of 1e-4 in B1 and 5e-4 in B2, respectively... For SFE channel, we use the pre-trained Open CLIP (Ilharco et al., 2021) to extract semantic features. The paper mentions specific software components like Adam W optimizer and Open CLIP but does not provide specific version numbers for these or other relevant libraries/frameworks.
Experiment Setup	Yes	For PFE channel, we first patchify each image into patches and the patch size is set to be N = 32 pixels. Then these patches are sorted using our DCT Scoring module with K = 6 different band-pass filters in the frequency domain. Subsequently, we select two highest-frequency and two lowest-frequency patches using the calculated DCT scores... During the training phase, we use Adam W optimizer with the learning rate of 1e-4 in B1 and 5e-4 in B2, respectively. The batch size is set to 32 and the model is trained on 8 NVIDIA A100 GPUs for only 5 epochs.