reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable Membership Inference Privacy

Authors: Zachary Izzo, Jinsung Yoon, Sercan O Arik, James Zou

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we propose a novel privacy notion, membership inference privacy (MIP)... We give a precise characterization of the relationship between MIP and DP... As a proof of concept, we also provide a simple algorithm for guaranteeing MIP without needing to guarantee DP. Lastly, this paper focused on developing on the theoretical principles and guarantees of MIP. Taking advantage of the relaxed requirements of MIP to develop practical algorithms, and systematic empirical evaluation of these algorithms, is an important direction for future work.
Researcher Affiliation	Collaboration	Zachary Izzo EMAIL NEC Labs America Jinsung Yoon EMAIL Google Cloud AI Sercan Ö. Arık EMAIL Google Cloud AI James Zou EMAIL Department of Biomedical Data Science Stanford University
Pseudocode	Yes	Algorithm 1 MIP via noise addition Require: Private dataset D, σ estimation budget B, MIP parameter η Dtrain Random Split(D, 1/2) # Estimate σ if an a priori bound is not known for i = 1, . . . , B do D(i) train Random Split(Dtrain, 1/2) θ(i) A(D(i) train) end for for j = 1, . . . , d do B PB i=1 θ(i) j σj 1 B PB i=1(θ(i) j θj)M 1/M # Add appropriate noise to the base algorithm s output U Unif({u Rd : u σ,M = 1}) r Laplace 6.16 return A(Dtrain) + X
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	No	The paper uses a generic dataset 'D' for theoretical discussions and proofs without referring to any specific, publicly available dataset used for empirical evaluation.
Dataset Splits	No	The paper describes theoretical concepts and an algorithm, but does not perform experiments on specific datasets, therefore no dataset split information is provided. Algorithm 1 includes 'Dtrain Random Split(D, 1/2)' as a theoretical step, not an empirical dataset split.
Hardware Specification	No	The paper focuses on theoretical development and does not describe any experimental setup or specific hardware used for computations.
Software Dependencies	No	The paper focuses on theoretical development and does not mention any specific software dependencies or their version numbers.
Experiment Setup	No	The paper is theoretical in nature, proposing a new privacy notion and an algorithm without detailing concrete experimental setups, hyperparameters, or training configurations for a practical application. Figure 1 shows theoretical noise level comparisons, not empirical experiment results.