Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

Authors: Yaling Shen, Zhixiong Zhuang, Kun Yuan, Maria-Irina Nicolae, Nassir Navab, Nicolas Padoy, Mario Fritz

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the IU XRAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data. We validate our ADA-STEAL method on the IU X-RAY and MIMIC-CXR test datasets, showing that it approaches the victim model s performance in both natural language generation metrics and clinical efficacy metrics, even when using the non-medical CIFAR100 dataset.
Researcher Affiliation Collaboration 1Bosch Center for Artificial Intelligence, Germany 2Technical University of Munich, Germany 3Munich Center for Machine Learning, Germany 4Saarland University, Germany 5University of Strasbourg, France 6IHU Strasbourg, France 7CISPA Helmholtz Center for Information Security, Germany
Pseudocode No The paper describes the methodology in narrative text and does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly provide a link to source code, nor does it contain a clear statement that the code for their methodology is being released or is available in supplementary materials.
Open Datasets Yes We validate ADA-STEAL on three standard datasets: IU X-RAY (Demner-Fushman et al. 2016), MIMIC-CXR (Johnson et al. 2019), and CIFAR100 (Krizhevsky 2009).
Dataset Splits Yes Table 2 summarizes the three datasets with official or conventional training and test split sizes. After preprocessing, the number of test samples in MIMIC-CXR and IU X-RAY is 3858 and 590, respectively.
Hardware Specification Yes All experiments are conducted on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions several models (CHEXAGENT, IDEFICS, ZEPHYR-7B) and general software frameworks implicitly, but does not provide specific version numbers for key software dependencies such as Python, PyTorch, or CUDA used for the implementation.
Experiment Setup Yes We set the probabilities of abnormal, normal, and original ( ˆY ) anatomical descriptions into 80%, 10%, and 10%, respectively, in the new report Y for adversarial perturbation generation. The learning rates for finetuning IDEFICS and CHEXAGENT* are fixed to 5 10 6 and 1 10 5, respectively, without weight decay. The maximum new sequence length is set to 512, and a diversity penalty of 0.2 with three beam groups, each containing six beams, is applied. The top-1 response is collected as the generated report. The adversarial noise budget ϵ is set to 0.2 unless otherwise specified. We initialize the attacker query set with 500 images from CIFAR-100, and then repeat the steps of our method three times, resulting in a total query budget of B = 1500.