reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Few-Shot Learner Generalizes Across AI-Generated Image Detection

Authors: Shiyu Wu, Jing Liu, Jing Li, Yequan Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that FSD achieves state-of-the-art performance by +11.6% average accuracy on the Gen Image dataset with only 10 additional samples. More importantly, our method is better capable of capturing the intra-category commonality in unseen images without further training.
Researcher Affiliation	Academia	1Institute of Automation, Chinese Academy of Sciences, Beijing, China 2Beijing Academy of Artificial Intelligence, Beijing, China 3University of Chinese Academy of Sciences, Beijing, China 4Harbin Institute of Technology, Shenzhen, China.
Pseudocode	Yes	Pseudo code to compute the loss J(ϕ) is provided in Algorithm 1.
Open Source Code	Yes	Our code is available at https://github.com/teheperinko541/Few-Shot AIGI-Detector.
Open Datasets	Yes	We evaluate our proposed method on the widely used Gen Image dataset (Zhu et al., 2023), which contains 1, 331, 167 real images from Image Net (Deng et al., 2009) and 1, 350, 000 generated images from 7 diffusion models and one GAN.
Dataset Splits	Yes	The real images are first divided into 8 subsets. Each subset is subsequently partitioned into a training part and a test part. Each generator is associated with one specific subset and uses the category labels of the real images within the subset to produce synthetic images. Consequently, the Gen Image dataset consists of 8 fake-vs-real subsets for analysis. (...) At every training step, 3 classes are randomly selected from the training set, with 5 samples chosen from each class for the support set and another 5 samples per class for the query set. (...) To maintain a consistent total number of test samples across different settings, we set a fixed ratio of 1 : 3 between the support set and the query set.
Hardware Specification	Yes	Our method is implemented with the Py Torch library and all the experiments are conducted on a single A100 with 40GB memory.
Software Dependencies	No	Our method is implemented with the Py Torch library
Experiment Setup	Yes	To be more comparable with previous works, we adopt the Res Net-50 (He et al., 2016) pretrained on Image Net (Deng et al., 2009) as the backbone of our model, which outputs a prototype vector of 1024 dimensions. Following Wang et al. (2023), the input images are first resized to 256 256 and then randomly cropped to 224 224 with random horizontal flipping during training. In contrast, only a center crop to 224 224 is performed after resizing the images during testing. The distance in the metric space is measured with Squared Euclidean Distance. (...) We employ Adam as the optimizer to minimize the cross-entropy loss with a base learning rate of 10 4. We also adopt a Step LR scheduler with γ = 0.5 and step size = 80000. Each classifier is trained for 200, 000 steps, with a batch size of 16.