reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SAFIRE: Segment Any Forged Image Region

Authors: Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam, Minji Son, Changick Kim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that SAFIRE demonstrates top performance in both the traditional binary IFL and the new task. ... We conduct an ablation study on the key components of our framework: Region-to-Region Contrastive Loss, area adaptive feature in Area-Adaptive Source Segmentation Loss, point prompting, and Confidence Loss (Table 2). We substitute each with a conventional counterpart for comparison. The results demonstrate diminished performance in the absence of any single component compared to the full SAFIRE framework, which integrates all four.
Researcher Affiliation	Collaboration	Myung-Joon Kwon1, Wonjun Lee1, Seung-Hun Nam2, Minji Son1, Changick Kim1 1School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST) 2NAVER WEBTOON AI, Seongnam, South Korea
Pseudocode	No	The paper describes the methodology using equations and textual descriptions of steps (e.g., in 'Pretraining: Region-to-Region Contrastive Learning', 'Training: Source Region Segmentation Using Point Prompts', and 'Inference: Multiple Points Aggregation'), but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code & Data https://github.com/mjkwon2021/SAFIRE
Open Datasets	Yes	Code & Data https://github.com/mjkwon2021/SAFIRE ... To facilitate the research on the new task, we construct and release a forgery dataset containing images composed of multiple sources. ... We train the network using a commonly adopted setting (Guillaro et al. 2023) that incorporates four datasets (Kniaz, Knyaz, and Remondino 2019; Novozamsky, Mahdian, and Saic 2020; Dong, Wang, and Tan 2013; Kwon et al. 2022) which consists of real and fake images, also known as the CAT-Net (Kwon et al. 2022) setting. We test the performance using five public datasets which have no overlap with the training datasets: Columbia (Ng, Chang, and Sun 2004), COVERAGE (Wen et al. 2016), Coco Glide (Guillaro et al. 2023), Realistic Tampering (Korus and Huang 2016), and NC16 (Guan et al. 2019).
Dataset Splits	No	The paper mentions training on a combination of four datasets and testing on five different public datasets, as well as creating their own datasets (Safire MS-Auto, Safire MS-Expert). However, it does not explicitly provide specific percentages, sample counts, or detailed methodologies for the training, validation, and test splits within these datasets in the main text.
Hardware Specification	No	The paper mentions 'memory constraints in some comparative methods' when discussing the NC16 dataset, which implies the use of hardware with memory limitations (e.g., GPUs). However, it does not provide specific details on the GPU models, CPU models, or any other hardware specifications used for running experiments.
Software Dependencies	No	The paper describes its methodology and experiments but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for replication.
Experiment Setup	Yes	The temperature τ for the region-to-region contrastive learning in Eq. (1) is set to 0.1. The weight limit CAASS for the AASS loss in Eq. (4) is set to 10 and λconf = 0.1 in Eq. (6). During the inference phase, M is fixed to 2 to obtain predictions in binary form. We use 16 × 16 point prompts and k-means clustering.