reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Authors: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the GIM demonstrate that GIMFormer surpasses the previous state-of-the-art approach on two different benchmarks. We conduct experiments on our proposed GIM. Both the qualitative and quantitative results demonstrate that GIMFormer can outperform previous state-of-the-art methods.
Researcher Affiliation	Collaboration	1 Shanghai Jiao Tong University 2 Tsinghua University 3 Huawei Noah s Ark Lab EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes methods and architectures like GIMFormer, Shadow Tracer, Frequency-Spatial Block, and Multi Windowed Anomalous Modelling, and includes mathematical formulations (e.g., equations 1-6) and diagrams (e.g., Figure 3 for GIMFormer architecture), but it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing their code, nor does it include a link to a code repository. It only mentions that for some baseline methods, they reproduced the code: "1 indicates that the original paper does not provide the code, we reproduce the code and evaluate it under the same settings."
Open Datasets	No	The paper states: "To this end, we propose a million-level generative-based IMDL dataset, termed GIM dataset, to provide a reliable database for AI Generated Content (AIGC) security. GIM leverages the generative models (Ho, Jain, and Abbeel 2020; Yin et al. 2025) and SAM (Kirillov et al. 2023a), with the images in Image Net (Deng et al. 2009) and VOC (Everingham et al. 2010) as the input." While it uses public datasets (ImageNet, VOC) as input for its generation pipeline and states the GIM dataset is proposed to 'provide a reliable database', it does not explicitly state that the generated GIM dataset itself is publicly available, nor does it provide a link, DOI, or repository for accessing it.
Dataset Splits	Yes	The final benchmark contains about 320k manipulated images with their tampering masks for training and testing. According to the analysis, the GIM benchmark uses 100 labels from Image Net to generate tampered images for training and employs all the test sets from Image Net and VOC for evaluation. In the mix-generator setting, the models are jointly trained on the GIM-SD, GIM-GLIDE and GIM-DDNM training set and tested on the correspondence test dataset respectively to evaluate the performance.
Hardware Specification	Yes	We train our models on 8 V100 GPUs with an initial learning rate of 6e 5 which is scheduled by the poly strategy with power 0.9 over 20 epochs.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. It mentions the optimizer Adam W, but not its software version.
Experiment Setup	Yes	We train our models on 8 V100 GPUs with an initial learning rate of 6e 5 which is scheduled by the poly strategy with power 0.9 over 20 epochs. The optimizer is Adam W (Loshchilov and Hutter 2017) with epsilon 1e 8 weight decay 1e 2, and the batch size is 4 on each GPU.