reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DocMIA: Document-Level Membership Inference Attacks against DocVQA Models

Authors: Khanh Nguyen, Raouf Kerkouche, Mario Fritz, Dimosthenis Karatzas

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluating our attacks on three multi-modal Doc VQA models and two datasets, we achieve state-of-the-art performance against multiple baselines, demonstrating their effectiveness and highlighting the privacy risks in this domain.
Researcher Affiliation	Academia	1Computer Vision Center, Universitat Aut onoma de Barcelona 2CISPA Helmholtz Center for Information Security EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Doc MIA Assignment
Open Source Code	Yes	1Code is available at https://github.com/khanhnguyen21006/mia_docvqa
Open Datasets	Yes	We study two established Doc VQA datasets in the literature for our analysis: Doc VQA (DVQA) (Mathew et al., 2021) and PFL-Doc VQA (PFL) (Tito et al., 2024).
Dataset Splits	Yes	From the official splits of each target dataset, we sample 300 member documents from the training set and 300 non-member documents from the test set, yielding Ntest = 600 test documents. [...] In Table 5, we present statistics for both the Doc VQA and PFL-Doc VQA datasets. Split Num. Docs Num. Questions Train 69894 221316 Val 9150 30491 Test 13463 43591
Hardware Specification	Yes	All attack methods are implemented using Py Torch and executed on an NVIDIA Ge Force A40 GPU with 45 GB of memory.
Software Dependencies	No	All attack methods are implemented using Py Torch and executed on an NVIDIA Ge Force A40 GPU with 45 GB of memory. [...] We assume the adversary has full knowledge of the Doc VQA task to train the model, including the training objective, document type and exact training questions. This assumption is reasonable, as task-level information such as document type, is often publicly available to guide users, making it accessible to adversaries. [...] we use Adam (Kingma, 2014) as the optimizer OPT across all attack experiments.
Experiment Setup	Yes	We tune the hyperparameters in the optimization process to ensure our attacks are effective against each target model in the white-box setting, then apply the best set to black-box attacks. Assuming no access to the training algorithm T , we use Adam (Kingma, 2014) as the optimizer OPT across all attack experiments. We explore the impact of learning rate α, the selected layer L, and we carefully tune the values of threshold τ in the ablation study (Appendix C). Subsequently, the optimal set of hyperparameters for each model is then applied in all black-box experiments. For aggregation Φ, we consider 4 aggregation functions {AVG; MIN; MAX; MED} for each feature, denoted as Φall. Throughout our experiments, we employ KMEANS as the clustering algorithm. See Appendix D for more implementation details. [...] Table 7: Best Hyperaremeters from our tuning process with consistent performance across both PFL and Doc VQA dataset. Model αFL αIG S L τFL τIG VT5 0.001 1.0 200 last FC layer 10 6 10 5