reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

Authors: Yaling Shen, Zhixiong Zhuang, Kun Yuan, Maria-Irina Nicolae, Nassir Navab, Nicolas Padoy, Mario Fritz

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the IU XRAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data. We validate our ADA-STEAL method on the IU X-RAY and MIMIC-CXR test datasets, showing that it approaches the victim model s performance in both natural language generation metrics and clinical efficacy metrics, even when using the non-medical CIFAR100 dataset.
Researcher Affiliation	Collaboration	1Bosch Center for Artificial Intelligence, Germany 2Technical University of Munich, Germany 3Munich Center for Machine Learning, Germany 4Saarland University, Germany 5University of Strasbourg, France 6IHU Strasbourg, France 7CISPA Helmholtz Center for Information Security, Germany
Pseudocode	No	The paper describes the methodology in narrative text and does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly provide a link to source code, nor does it contain a clear statement that the code for their methodology is being released or is available in supplementary materials.
Open Datasets	Yes	We validate ADA-STEAL on three standard datasets: IU X-RAY (Demner-Fushman et al. 2016), MIMIC-CXR (Johnson et al. 2019), and CIFAR100 (Krizhevsky 2009).
Dataset Splits	Yes	Table 2 summarizes the three datasets with official or conventional training and test split sizes. After preprocessing, the number of test samples in MIMIC-CXR and IU X-RAY is 3858 and 590, respectively.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions several models (CHEXAGENT, IDEFICS, ZEPHYR-7B) and general software frameworks implicitly, but does not provide specific version numbers for key software dependencies such as Python, PyTorch, or CUDA used for the implementation.
Experiment Setup	Yes	We set the probabilities of abnormal, normal, and original ( ˆY ) anatomical descriptions into 80%, 10%, and 10%, respectively, in the new report Y for adversarial perturbation generation. The learning rates for finetuning IDEFICS and CHEXAGENT* are fixed to 5 10 6 and 1 10 5, respectively, without weight decay. The maximum new sequence length is set to 512, and a diversity penalty of 0.2 with three beam groups, each containing six beams, is applied. The top-1 response is collected as the generated report. The adversarial noise budget ϵ is set to 0.2 unless otherwise specified. We initialize the attacker query set with 500 images from CIFAR-100, and then repeat the steps of our method three times, resulting in a total query budget of B = 1500.