Differentially Private Regression and Classification with Sparse Gaussian Processes
Authors: Michael Thomas Smith, Mauricio A. Alvarez, Neil D. Lawrence
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test the above methods with a series of experiments. Firstly, we investigate the nonstationary covariance methods, deployed to reduce outlier noise. We explored their effect in one and two dimensions in Sections 6.1 and 6.2 respectively. In Section 6.3 we look at reasons for why varying the lengthscale doesn t reduce sensitivity to outliers. In Sections 6.4 and 6.5 we experiment with the binary DP classification method in one and two dimensions. In Section 6.6 e test the sparse DP classification method on the high dimensional MNIST data set. Finally, Section 6.7 demonstrates how DP (hyper)parameter selection might be performed. |
| Researcher Affiliation | Academia | Michael Thomas Smith1 EMAIL Mauricio A. Álvarez EMAIL Department of Computer Science University of Sheffield, UK Neil D. Lawrence2 EMAIL Department of Computer Science and Technology University of Cambridge, UK |
| Pseudocode | Yes | Algorithm 1 Hyperparameter selection using the exponential mechanism. Require: M and Θ; the GP model and the hyperparameter configurations we will test. Require: Xall RN D, yall RN; inputs and outputs. Require: d > 0, data sensitivity & ε > 0, δ > 0, the DP parameters. Require: κ > 1, the number of folds in the cross-validation. 1: function Hyperparameter Selection(Xall, yall, M, Θ, d, ε, δ, κ) |
| Open Source Code | No | The paper mentions "This paper and associated libraries provide a robust toolkit for combining differential privacy and Gaussian processes in a practical manner." However, it does not provide a specific link, repository, or explicit statement of code release for the methodology described in the paper. |
| Open Datasets | Yes | We use, as a simple demonstration, the heights and ages of 287 women from a census of the !Kung (Howell, N., 1967). We use the Home Equity Loans (HEL) data set (Scheule et al., 2017). ...To demonstrate the DP sparse classification method we turn to a higher dimensional data set. In this problem we try to classify MNIST digits... |
| Dataset Splits | Yes | For this experiment we split the data into two halves to avoid circular analysis; using one half for selecting hyperparameters, and the other for estimating the RMSE achieved if we had used those hyperparameters. The training half is again split in a 5-fold cross-validation step to estimate the SSE for each configuration of hyperparameters. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions various machine learning concepts and algorithms but does not provide specific software library names with version numbers that would be necessary to reproduce the experiments (e.g., Python version, PyTorch/TensorFlow version, scikit-learn version). |
| Experiment Setup | Yes | We used an EQ kernel with lengthscale set a priori, to 15 years as from our prior experience of human development this is the timescale over which gradients vary. Similarly we set the kernel variance to 10cm2 and the Gaussian likelihood noise variance to 25cm2. ...We search exponentially the three parameters (lengthscale 1, 5, 25, 125, 625 years; Gaussian noise variance 0.2, 1, 5, 25 cm2; Kernel variance 1, 5, 25, 125 cm2). |