reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Authors: Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average ε of the class with the lowest test accuracy is 44.2% higher than that of the class with the highest accuracy. Our code is available at https://github.com/dayu11/individual_privacy_of_DPSGD.
Researcher Affiliation	Collaboration	Da Yu EMAIL Sun Yat-sen University Gautam Kamath * EMAIL Cheriton School of Computer Science University of Waterloo Janardhan Kulkarni * EMAIL Microsoft Research Tie-Yan Liu * EMAIL Microsoft Research Jian Yin * EMAIL Sun Yat-sen University Huishuai Zhang * EMAIL Microsoft Research
Pseudocode	Yes	Algorithm 1 Differentially Private SGD [...] Algorithm 2 Individual Privacy Accounting for DP-SGD
Open Source Code	Yes	Our code is available at https://github.com/dayu11/individual_privacy_of_DPSGD.
Open Datasets	Yes	Datasets. We use two benchmark datasets MNIST (n = 60000) and CIFAR-10 (n = 50000) (Le Cun et al., 1998; Krizhevsky, 2009) as well as the UTKFace dataset (n 15000) (Zhang et al., 2017)
Dataset Splits	No	The paper mentions using benchmark datasets MNIST (n = 60000), CIFAR-10 (n = 50000), and UTKFace (n 15000). While these are standard datasets, the paper does not explicitly state the train/test/validation split percentages or sample counts used for these datasets. For UTKFace, it only mentions modifications to balance labels, not data splits.
Hardware Specification	Yes	All experiments are run on single Tesla V100 GPUs with 32G memory. [...] All results in Table 1 use multiprocessing with 5 cores of an AMD EPYC 7V13 CPU.
Software Dependencies	No	The paper mentions using the 'Opacus library' and 'the numerical method in Mironov et al. (2019)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Models and hyperparameters. For CIFAR-10, we use the WRN16-4 model in De et al. (2022)... We set C = 1 on CIFAR-10... For MNIST and UTKFace, we set C as the median of gradient norms at initialization... The batchsize is 4096 for CIFAR-10 and 1024 for MNIST and UTKFace. The training epoch is 300 for CIFAR-10 and 100 for MNIST and UTKFace... We update the batch gradient norms three times per epoch for all experiments in this section...