reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially private partitioned variational inference

Authors: Mikko A. Heikkilä, Matthew Ashman, Siddharth Swaroop, Richard E Turner, Antti Honkela

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution in the federated learning setting while minimising the number of communication rounds and providing differential privacy guarantees for data subjects. We propose three alternative implementations in the general framework, one based on perturbing local optimisation runs done by individual parties, and two based on perturbing updates to the global model (one using a version of federated averaging, the second one adding virtual parties to the protocol), and compare their properties both theoretically and empirically. [...] In this Section we empirically test our proposed methods using logistic regression and Bayesian neural network (BNN) models. Our code for running all the experiments is openly available from https://github.com/ DPBayes/DPPVI. We utilise a mean-field Gaussian variational approximation in all experiments. For datasets and prediction tasks, we employ UCI Adult (Kohavi, 1996; Dua & Graff, 2017) predicting whether an individual has income > 50k, as well as balanced MIMIC-III health data (Johnson et al., 2016b;a; Goldberger et al., 2000) with an in-hospital mortality prediction task (Harutyunyan et al., 2019).
Researcher Affiliation	Collaboration	Mikko A. Heikkilä EMAIL Telefónica Research Matthew Ashman EMAIL Department of Engineering University of Cambridge Siddharth Swaroop EMAIL School of Engineering and Applied Sciences Harvard University Richard E. Turner EMAIL Department of Engineering University of Cambridge Antti Honkela EMAIL Department of Computer Science University of Helsinki
Pseudocode	Yes	Algorithm 1 Non-private PVI (Ashman et al., 2022) Algorithm 2 PVI with local averaging Algorithm 3 PVI with virtual clients
Open Source Code	Yes	Our code for running all the experiments is openly available from https://github.com/ DPBayes/DPPVI.
Open Datasets	Yes	For datasets and prediction tasks, we employ UCI Adult (Kohavi, 1996; Dua & Graff, 2017) predicting whether an individual has income > 50k, as well as balanced MIMIC-III health data (Johnson et al., 2016b;a; Goldberger et al., 2000) with an in-hospital mortality prediction task (Harutyunyan et al., 2019).
Dataset Splits	Yes	With Adult data, we first combine the training and test sets, and then randomly split the whole data with 80% for training and 20% for validation. With MIMIC-III data, we first preprocessing the data for the in-hospital mortality prediction task as detailed by Harutyunyan et al. (2019). [...] The preprocessed data is very unbalanced and leaves little room for showing the differences between the methods (a constant prediction can reach close to 90% accuracy while a non-DP prediction can do some percentage points better), we first re-balance the data by keeping only as many majority label samples as there are in the minority class. This leaves 5594 samples, which are then randomly split into training and validation sets, giving a total of 4475 samples of training data to be divided over all the clients. We divide the data between M clients using the following scheme: half of the clients are small and the other half large, with data sizes given by nsmall = n M (1 ρ) , nlarge = n M (1 + ρ) , with ρ [0, 1]. ρ = 0 gives equal data sizes for everyone while ρ = 1 means that the small clients have no data. For creating unbalanced data distributions, denote the fraction of majority class samples by λ. Then the target fraction of majority class samples for the small clients is parameterized by κ: λtarget small = λ + (1 λ) κ, where having κ = 1 means small clients only have majority class labels, and κ = λ 1 λ implies small clients have only minority class labels. For large clients the labels are divided randomly. We use the following splits in the experiments: Table 2: Adult data, 10 clients data split. Table 3: Adult data, 200 clients data split. Table 4: Balanced MIMIC-III data, 10 clients data split.
Hardware Specification	No	The authors acknowledge CSC IT Center for Science, Finland, and the Finnish Computing Competence Infrastructure (FCCI) for computational and data storage resources.
Software Dependencies	No	We use Adam (Kingma & Ba, 2014) to optimise all objective functions. [...] All privacy bounds are calculated numerically with the Fourier accountant (Koskela et al., 2020).
Experiment Setup	Yes	We utilise a mean-field Gaussian variational approximation in all experiments. For datasets and prediction tasks, we employ UCI Adult (Kohavi, 1996; Dua & Graff, 2017) predicting whether an individual has income > 50k, as well as balanced MIMIC-III health data (Johnson et al., 2016b;a; Goldberger et al., 2000) with an in-hospital mortality prediction task (Harutyunyan et al., 2019). In all experiments, we use sequential PVI when not assuming a trusted aggregator, and synchronous PVI otherwise. The number of communications is measured as the number of server-client message exchanges performed by all clients. The actual wall-clock times would depend on the method and implementation: with sequential PVI only one client can update at any one time but communications do not need encryption, while with synchronous PVI all clients can update at the same time but the trusted aggregator methods would also need to account for the time taken by the secure primitive in question. All privacy bounds are calculated numerically with the Fourier accountant (Koskela et al., 2020).5 The reported privacy budgets include only the privacy cost of the actual learning, while we ignore the privacy leakage due to hyperparameter tuning. More details on the experiments can be found in Appendix B. [...] We use Adam (Kingma & Ba, 2014) to optimise all objective functions. In general, depending e.g. on the update schedule, even the non-DP PVI can diverge (see Ashman et al. 2022). We found that DP-PVI using any of our approaches is more prone to diverge than non-DP PVI, while DP optimisation is more stable than local averaging or virtual PVI clients. To improve model stability, we use some damping in all the experiments. When damping with a factor ρ (0, 1], at global update s the model parameters λ(s) are set to (1 ρ) λ(s 1) + ρ λ(s). We use grid search to optimise all hyperparameters in terms of predictive accuracy and model log-likelihood using 1 random seed, and then run 5 independent random seeds using the best hyperparameters from the 1 seed runs. The reported results with 5 random seeds are the best results in terms of log-likelihood for each model. With BNNs using local averaging or virtual PVI clients some seeds diverged when using the hyperparameters optimised using a single seed. These seeds were rerun with the same hyperparameter settings to produce 5 full runs. This might give the methods some extra advantage in the comparison, but since they still do not work too well, we can surmise that the methods are not well suited for the task. [...] Figure 5: Logistic regression, balanced MIMIC-III data with 10 clients: mean over 5 seeds with SEM, balanced split. a) Without DP, increasing the number of local partitions can lead to slower convergence, b) non-DP with clipping norm C, and (4, 10 5)-DP: wihout privacy increasing the number of local partitions does not help (non-DP with clipping C = 1000), or even hurts performance (non-DP with clipping C = 1), while with DP, increasing the number of local partitions mitigates the effect of DP noise. Note that increasing the number of local partitions can also increase the bias due to clipping, especially with tight clipping bound. In this experiment, we use same fixed hyperparameters in all runs: number of global updates or communication rounds = 5, number of local steps = 50, learning rate = 10 2, damping = .4.