Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent
Authors: Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average ε of the class with the lowest test accuracy is 44.2% higher than that of the class with the highest accuracy. Our code is available at https://github.com/dayu11/individual_privacy_of_DPSGD. |
| Researcher Affiliation | Collaboration | Da Yu EMAIL Sun Yat-sen University Gautam Kamath * EMAIL Cheriton School of Computer Science University of Waterloo Janardhan Kulkarni * EMAIL Microsoft Research Tie-Yan Liu * EMAIL Microsoft Research Jian Yin * EMAIL Sun Yat-sen University Huishuai Zhang * EMAIL Microsoft Research |
| Pseudocode | Yes | Algorithm 1 Differentially Private SGD [...] Algorithm 2 Individual Privacy Accounting for DP-SGD |
| Open Source Code | Yes | Our code is available at https://github.com/dayu11/individual_privacy_of_DPSGD. |
| Open Datasets | Yes | Datasets. We use two benchmark datasets MNIST (n = 60000) and CIFAR-10 (n = 50000) (Le Cun et al., 1998; Krizhevsky, 2009) as well as the UTKFace dataset (n 15000) (Zhang et al., 2017) |
| Dataset Splits | No | The paper mentions using benchmark datasets MNIST (n = 60000), CIFAR-10 (n = 50000), and UTKFace (n 15000). While these are standard datasets, the paper does not explicitly state the train/test/validation split percentages or sample counts used for these datasets. For UTKFace, it only mentions modifications to balance labels, not data splits. |
| Hardware Specification | Yes | All experiments are run on single Tesla V100 GPUs with 32G memory. [...] All results in Table 1 use multiprocessing with 5 cores of an AMD EPYC 7V13 CPU. |
| Software Dependencies | No | The paper mentions using the 'Opacus library' and 'the numerical method in Mironov et al. (2019)' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Models and hyperparameters. For CIFAR-10, we use the WRN16-4 model in De et al. (2022)... We set C = 1 on CIFAR-10... For MNIST and UTKFace, we set C as the median of gradient norms at initialization... The batchsize is 4096 for CIFAR-10 and 1024 for MNIST and UTKFace. The training epoch is 300 for CIFAR-10 and 100 for MNIST and UTKFace... We update the batch gradient norms three times per epoch for all experiments in this section... |