reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Authors: Guosheng Zhang, Keyao Wang, Haixiao Yue, Ajian Liu, Gang Zhang, Kun Yao, Errui Ding, Jingdong Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on standard and newly devised One to Eleven cross-domain benchmarks, comprising 12 public datasets, demonstrate that our method significantly outperforms state-of-the-art methods.
Researcher Affiliation	Collaboration	1Department of Computer Vision Technology (VIS), Baidu Inc 2CBSR&MAIS, Institute of Automation, Chinese Academy of Sciences (CASIA)
Pseudocode	Yes	Algorithm 1: Spoof-aware Captioning and Filtering Input: Dataset D = {(Ii, Y i)}N i=1, where Ii {IR, IF }, Y i {YR, YF }, F = {print, replay, mask, mannequin} Keywords: K = { paper : Yprint, screen : Yreplay, ...} General captioner: CG Output: Dataset Dcap = {(Ii, Y i, T i)}N i=1,where T i {TR, TS} 1: Captioning TF = CG(IF ) 2: Initialize empty dataset DS 3: for each sample (Ii F , Y i F , T i F ) do 4: for each keyword k in K do 5: if k in T i F and K[k] match Y i F then 6: DS DS {(Ii F , Y i F , T i F )} 7: end if 8: end for 9: end for 10: Finetune CG with DS then obtain CS 11: Captioning TR = CG(IR) and TS = CS(IF ) 12: Dcap {(IR, YR, TR)} {(IF , YF , TS)} 13: return Dcap
Open Source Code	No	The text does not provide an explicit statement of code release or a link to a repository for the methodology described in this paper.
Open Datasets	Yes	We evaluate our method on two protocols. For Protocol 1, Following established practices, we implement the leave-one-domain-out testing approach on several datasets: MSU-MFSD (M)(Wen, Han, and Jain 2015), CASIA-MFSD (C) (Zhang et al. 2012), Idiap Replay Attack (I) (Chingovska, Anjos, and Marcel 2012), and OULU-NPU (O) (Boulkenafet et al. 2017). To assess the robustness of our method in more demanding conditions, we set up Protocol 2 as One to Eleve testing protocol. Employing only Celeb A-Spoof (Zhang et al. 2020b) as the source domain, and 11 datasets as target domains for cross-domain testing. This selection include MSU-MFSD (Wen, Han, and Jain 2015), CASIAMFSD (Zhang et al. 2012), Idiap Replay Attack (Chingovska, Anjos, and Marcel 2012), OULU-NPU (Boulkenafet et al. 2017), SIW (Liu, Jourabloo, and Liu 2018), Rose Youtu (Li et al. 2018), HKBU-MARs-V1+ (Liu, Lan, and Yuen 2018), WMCA (George et al. 2019), SIW-M-V2 (Guo et al. 2022), CASIA-SURF-3DMask (Yu et al. 2020a) and Hi Fi Mask (Liu et al. 2022a).
Dataset Splits	Yes	For Protocol 1, Following established practices, we implement the leave-one-domain-out testing approach on several datasets: MSU-MFSD (M)(Wen, Han, and Jain 2015), CASIA-MFSD (C) (Zhang et al. 2012), Idiap Replay Attack (I) (Chingovska, Anjos, and Marcel 2012), and OULU-NPU (O) (Boulkenafet et al. 2017). To assess the robustness of our method in more demanding conditions, we set up Protocol 2 as One to Eleve testing protocol. Employing only Celeb A-Spoof (Zhang et al. 2020b) as the source domain, and 11 datasets as target domains for cross-domain testing.
Hardware Specification	No	The text does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The text mentions pre-trained models (CLIP, OPT-2.7B) but does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9, CUDA 11.1) needed to replicate the experiment.
Experiment Setup	Yes	Implementation Details: We crop the face images and resize them to 224 224 3 with RGB channels. For the frozen image encoder, we utilize pre-trained vision models: Vi T-L/14 from CLIP (Radford et al. 2021). Following (Li et al. 2023), OPT-2.7B (Zhang et al. 2022) is adopted as the pre-trained large language model. We use the Adam W optimizer, with an initial learning rate set to 10 5 and a weight decay parameter set to 10 2. We configure our training process with a batch size of 32 and a maximum of 10 epochs for both Protocol 1 and Protocol 2. For Protocol 2, we meticulously reproduce the baseline methods, including FLIP (Srivatsan, Naseer, and Nandakumar 2023) and Vi TAF (Huang et al. 2022), using the official code provided. Both Vi T-B and Vi T-L are pre-trained CLIP (Radford et al. 2021). To ensure the integrity and reproducibility of our experiments, we report all results as the mean of three independent runs, each with a unique initialization seed.