reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning

Authors: Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we conducted a systematic evaluation of existing score-based MIAs (Yeom et al., 2018; Salem et al., 2019; Ye et al., 2022; Liu et al., 2022; Bertran et al., 2023; Li et al., 2024; Suri et al., 2024) in transfer learning context. Our results confirm that MIA efficacy generally decreases as the number of examples per class increases for most score-based attacks in transfer learning, consistent with the power-law relationship previously observed. Additionally, we analyze the effects of changing the training paradigm and properties of the attacks on MIA efficacy.
Researcher Affiliation	Academia	Yuxuan Bai EMAIL Department of Computer Science University of Helsinki Gauri Pradhan EMAIL Department of Computer Science University of Helsinki Marlon Tobaben EMAIL Department of Computer Science University of Helsinki Antti Honkela EMAIL Department of Computer Science University of Helsinki
Pseudocode	No	The paper describes various membership inference attacks and experimental procedures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code for our experiments is available at: https://github.com/DPBayes/empirical-comparison-mia-transfer-learning.
Open Datasets	Yes	Datasets We use CIFAR-10, CIFAR-100 (Krizhevsky, 2009), and Patch Camelyon (Veeling et al., 2018) in our experiments. CIFAR-10 and CIFAR-100 are common benchmark datasets for MIA evaluation. Patch Camelyon, including only 2 classes, enables experiments with substantially larger number of shots S (examples per class), providing greater insight into how training set size affects MIA efficacy. Models We use Vi T-B/16 (Dosovitskiy et al., 2021) and Bi T-M-R50x1 (R-50) (Kolesnikov et al., 2020) as the backbone models for fine-tuning, both pre-trained on Image Net-21k (Deng et al., 2009).
Dataset Splits	No	The paper states: "In each HPO trial, we use 70% of the data for training the model while the remaining 30% is used as validation dataset." This refers to the split for hyperparameter optimization (HPO). However, it does not explicitly specify the train/test/validation splits for the main datasets (CIFAR-10, CIFAR-100, Patch Camelyon) used to train the target models in the reported experiments beyond defining 'S' (shots) as examples per class in the training dataset.
Hardware Specification	No	The authors wish to acknowledge CSC IT Center for Science, Finland, for computational resources (Project 2003275). This statement refers to general computational resources but does not provide specific details such as GPU models, CPU types, or other hardware specifications.
Software Dependencies	No	We implement HPO using Optuna (Akiba et al., 2019) with Tree-structured Parzen Estimator (TPE) algorithm (Bergstra et al., 2011). While Optuna and TPE are mentioned, specific version numbers for these or other key software libraries (e.g., PyTorch, TensorFlow, CUDA, Python) are not provided.
Experiment Setup	No	Table 2 summarizes the hyperparameters and their corresponding search ranges used in our experiments. While search ranges for Epoch, Train Batch Size, and Learning Rate are provided for hyperparameter optimization, the paper does not explicitly state the specific, optimized hyperparameter values used for the main experimental runs presented in the results sections.