reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Do Parameters Reveal More than Loss for Membership Inference?

Authors: Anshuman Suri, Xiao Zhang, David Evans

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate its effectiveness in simple settings, showing that it outperforms state-of-the-art reference-model-based and prior white-box attacks (Section 4). Our analyses suggest that the improved auditing performance can be directly attributed to access to the model s parameters. . 4 Experiments To evaluate IHA, we efficiently pre-compute L1(w) to facilitate the computation of L0(w) for any given target record z1.
Researcher Affiliation	Academia	Anshuman Suri EMAIL University of Virginia; Xiao Zhang EMAIL CISPA Helmholtz Center for Information Security; David Evans EMAIL University of Virginia
Pseudocode	No	The paper describes the methodology using mathematical equations and prose, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	Yes	Our implementation for reproducing all the experiments is available as open-source code at https://github.com/iamgroot42/auditingmi.
Open Datasets	Yes	Purchase-100(S). The task for this dataset (Shokri et al., 2017)... MNIST-Odd. We consider the MNIST dataset (Le Cun et al., 1998)... Fashion MNIST. We use the Fashion MNIST (Xiao et al., 2017) dataset...
Dataset Splits	No	The paper describes how training data for individual models is sampled ("data from each model is sampled at random from the actual dataset with a 50% probability") and how evaluation is performed (FPR/TPR on members/non-members), but it does not specify explicit training/validation/test splits (e.g., 80/10/10 split or specific sample counts) for the datasets themselves for model training.
Hardware Specification	No	The paper mentions that 'the Hessian is too large to store on our GPU for IHA and is thus stored on the CPU,' but it does not provide specific models or specifications for the GPU or CPU used for the experiments.
Software Dependencies	No	The paper mentions the use of machine learning frameworks and libraries implicitly through the mathematical notation and discussion of SGD, but it does not explicitly list any software dependencies with specific version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	All of our models are trained with momentum (µ = 0.9) and regularization (α = 5e 4), with a learning rate λ = 0.01.