DocMIA: Document-Level Membership Inference Attacks against DocVQA Models
Authors: Khanh Nguyen, Raouf Kerkouche, Mario Fritz, Dimosthenis Karatzas
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluating our attacks on three multi-modal Doc VQA models and two datasets, we achieve state-of-the-art performance against multiple baselines, demonstrating their effectiveness and highlighting the privacy risks in this domain. |
| Researcher Affiliation | Academia | 1Computer Vision Center, Universitat Aut onoma de Barcelona 2CISPA Helmholtz Center for Information Security EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Doc MIA Assignment |
| Open Source Code | Yes | 1Code is available at https://github.com/khanhnguyen21006/mia_docvqa |
| Open Datasets | Yes | We study two established Doc VQA datasets in the literature for our analysis: Doc VQA (DVQA) (Mathew et al., 2021) and PFL-Doc VQA (PFL) (Tito et al., 2024). |
| Dataset Splits | Yes | From the official splits of each target dataset, we sample 300 member documents from the training set and 300 non-member documents from the test set, yielding Ntest = 600 test documents. [...] In Table 5, we present statistics for both the Doc VQA and PFL-Doc VQA datasets. Split Num. Docs Num. Questions Train 69894 221316 Val 9150 30491 Test 13463 43591 |
| Hardware Specification | Yes | All attack methods are implemented using Py Torch and executed on an NVIDIA Ge Force A40 GPU with 45 GB of memory. |
| Software Dependencies | No | All attack methods are implemented using Py Torch and executed on an NVIDIA Ge Force A40 GPU with 45 GB of memory. [...] We assume the adversary has full knowledge of the Doc VQA task to train the model, including the training objective, document type and exact training questions. This assumption is reasonable, as task-level information such as document type, is often publicly available to guide users, making it accessible to adversaries. [...] we use Adam (Kingma, 2014) as the optimizer OPT across all attack experiments. |
| Experiment Setup | Yes | We tune the hyperparameters in the optimization process to ensure our attacks are effective against each target model in the white-box setting, then apply the best set to black-box attacks. Assuming no access to the training algorithm T , we use Adam (Kingma, 2014) as the optimizer OPT across all attack experiments. We explore the impact of learning rate α, the selected layer L, and we carefully tune the values of threshold τ in the ablation study (Appendix C). Subsequently, the optimal set of hyperparameters for each model is then applied in all black-box experiments. For aggregation Φ, we consider 4 aggregation functions {AVG; MIN; MAX; MED} for each feature, denoted as Φall. Throughout our experiments, we employ KMEANS as the clustering algorithm. See Appendix D for more implementation details. [...] Table 7: Best Hyperaremeters from our tuning process with consistent performance across both PFL and Doc VQA dataset. Model αFL αIG S L τFL τIG VT5 0.001 1.0 200 last FC layer 10 6 10 5 |