reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

WildFake: A Large-Scale and Hierarchical Dataset for AI-Generated Images Detection

Authors: Yan Hong, Jianming Feng, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluations offer insights into the performance of generative models at various levels, showcasing Wild Fake s unique hierarchical structure s benefits. We have conducted a series of experiments on the Wild Fake dataset to assess the generalization capabilities of detectors trained on fake images, demonstrating Wild Fake s potential to enhance the understanding of fake image detection in a multitude of real-world scenarios. Additionally, we have implemented a series of degradation tests on the Wild Fake testing set, illustrating the robustness of these detectors in challenging conditions. Table 2: Evaluating the generalized performance across different datasets and detectors. Performance metrics including (ACC(%), AP(%), and AUC(%)) are reported.
Researcher Affiliation	Collaboration	1Ant Group, 2Qing Yuan Research Institute, Shanghai Jiao Tong University
Pseudocode	No	The paper describes methods and processes in prose but does not contain any structured pseudocode or algorithm blocks explicitly labeled as such.
Open Source Code	Yes	Wild Fake dataset are available at https://github.com/hy-zpg/AIGC-Image-Detection-Dataset
Open Datasets	Yes	Wild Fake dataset are available at https://github.com/hy-zpg/AIGC-Image-Detection-Dataset. Real images are gathered from open datasets used in various tasks like image captioning, generation, and classification, ensuring a broad spectrum of styles and content. For gathering images from GANs and Others, we primarily utilize official Git Hub repositories and model cards from Hugging Face. When these Git Hub repositories include generated samples, we directly extract fake images from there. The rule of collecting real images is to similar to the training set of the generators. Considering the fact that fake images from GANs and Others are limited to specific domains determined by training datasets such as COCO (Lin et al. 2014), FFHQ (Karras, Laine, and Aila 2019), Image Net (Deng et al. 2009), LSUN Church (Yu et al. 2015), Celeb A-HQ (Karras et al. 2017), AFHQ (Choi et al. 2020) dataset, we sample parts of real images from those datasets. Besides, recent text-to-image generators mostly trained on Laion-5B (Schuhmann et al. 2022) or Chinese cross-modal Wukong (Gu et al. 2022a) datasets, we also include real image samples from these text-to-image datasets, which are commonly utilized for training DMs.
Dataset Splits	Yes	We split real images (resp., fake images) into the training set and testing set as the ratio of 4 : 1. In detail, for all generators in Figure 2, 20% samples are randomly selected as the testing set from fake images generated by each generator, with the remainder forming the training set. A similar splitting strategy is applied to the real datasets shown in Figure 2.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'Exponentially Decay scheduler' and models like 'Res Net50' and 'Vi T', but does not provide specific version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	All training images are resized to 224 224, with the Adam optimizer and Exponentially Decay scheduler with an initial learning rate of 1𝑒 4, and batch size (resp., epoch) is set as 1024 (resp., 15).