reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially Private Regression and Classification with Sparse Gaussian Processes

Authors: Michael Thomas Smith, Mauricio A. Alvarez, Neil D. Lawrence

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the above methods with a series of experiments. Firstly, we investigate the nonstationary covariance methods, deployed to reduce outlier noise. We explored their eﬀect in one and two dimensions in Sections 6.1 and 6.2 respectively. In Section 6.3 we look at reasons for why varying the lengthscale doesn t reduce sensitivity to outliers. In Sections 6.4 and 6.5 we experiment with the binary DP classiﬁcation method in one and two dimensions. In Section 6.6 e test the sparse DP classiﬁcation method on the high dimensional MNIST data set. Finally, Section 6.7 demonstrates how DP (hyper)parameter selection might be performed.
Researcher Affiliation	Academia	Michael Thomas Smith1 EMAIL Mauricio A. Álvarez EMAIL Department of Computer Science University of Sheﬃeld, UK Neil D. Lawrence2 EMAIL Department of Computer Science and Technology University of Cambridge, UK
Pseudocode	Yes	Algorithm 1 Hyperparameter selection using the exponential mechanism. Require: M and Θ; the GP model and the hyperparameter conﬁgurations we will test. Require: Xall RN D, yall RN; inputs and outputs. Require: d > 0, data sensitivity & ε > 0, δ > 0, the DP parameters. Require: κ > 1, the number of folds in the cross-validation. 1: function Hyperparameter Selection(Xall, yall, M, Θ, d, ε, δ, κ)
Open Source Code	No	The paper mentions "This paper and associated libraries provide a robust toolkit for combining differential privacy and Gaussian processes in a practical manner." However, it does not provide a specific link, repository, or explicit statement of code release for the methodology described in the paper.
Open Datasets	Yes	We use, as a simple demonstration, the heights and ages of 287 women from a census of the !Kung (Howell, N., 1967). We use the Home Equity Loans (HEL) data set (Scheule et al., 2017). ...To demonstrate the DP sparse classiﬁcation method we turn to a higher dimensional data set. In this problem we try to classify MNIST digits...
Dataset Splits	Yes	For this experiment we split the data into two halves to avoid circular analysis; using one half for selecting hyperparameters, and the other for estimating the RMSE achieved if we had used those hyperparameters. The training half is again split in a 5-fold cross-validation step to estimate the SSE for each conﬁguration of hyperparameters.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions various machine learning concepts and algorithms but does not provide specific software library names with version numbers that would be necessary to reproduce the experiments (e.g., Python version, PyTorch/TensorFlow version, scikit-learn version).
Experiment Setup	Yes	We used an EQ kernel with lengthscale set a priori, to 15 years as from our prior experience of human development this is the timescale over which gradients vary. Similarly we set the kernel variance to 10cm2 and the Gaussian likelihood noise variance to 25cm2. ...We search exponentially the three parameters (lengthscale 1, 5, 25, 125, 625 years; Gaussian noise variance 0.2, 1, 5, 25 cm2; Kernel variance 1, 5, 25, 125 cm2).