reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adversarial Robustness Guarantees for Gaussian Processes

Authors: Andrea Patane, Arno Blaas, Luca Laurenti, Luca Cardelli, Stephen Roberts, Marta Kwiatkowska

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods on a collection of synthetic and standard benchmark data sets, including SPAM, MNIST and Fashion MNIST. We study the eﬀect of approximate inference techniques on robustness and demonstrate how our method can be used for interpretability. Our empirical results suggest that the adversarial robustness of GPs increases with accurate posterior estimation.
Researcher Affiliation	Academia	Andrea Patan e EMAIL Department of Computer Science University of Oxford Oxford, OX1 3QG, United Kingdom Arno Blaas EMAIL Department of Engineering Science University of Oxford Oxford, OX2 6ED, United Kingdom Luca Laurenti EMAIL Department of Computer Science University of Oxford Oxford, OX1 3QG, United Kingdom Luca Cardelli EMAIL Department of Computer Science University of Oxford Oxford, OX1 3QG, United Kingdom Stephen Roberts EMAIL Department of Engineering Science University of Oxford Oxford, OX2 6ED, United Kingdom Marta Kwiatkowska EMAIL Department of Computer Science University of Oxford Oxford, OX1 3QG, United Kingdom
Pseudocode	Yes	Algorithm 1 Branch and bound for πmin(T) Input: Input space subset T; error tolerance ϵ > 0; latent mean/variance functions µ( ) and Σ( ). Output: Lower and upper bounds on πmin(T) with πU min(T) πL min(T) ϵ
Open Source Code	Yes	1. The code can be found at https://github.com/andreapatane/check-GPclass.
Open Datasets	Yes	We evaluate our methods on a collection of synthetic and standard benchmark data sets, including SPAM, MNIST and Fashion MNIST. For the SPAM data set (Dua and Graﬀ, 2017) For MNIST and F-MNIST ... (Le Cun, 1998) ... (Xiao et al., 2017) Furthermore, we analyse the robustness of the Water Quality data set (Dˇzeroski et al., 2000)
Dataset Splits	Yes	For the Synthetic2D data set we learn the GP over 1000 training samples and test it over 200 test samples For MNIST and F-MNIST... with 1000 training samples randomly picked from the two data sets. Finally, we standardise the Water Quality data set and use the full set of 16 input features to predict the 14 regression outputs. To do so, we learn a multi-output regression GP over a 50%/50% train/validation split of the full data set (1060 data points)
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	We rely on the GPML Matlab toolbox for the training of two-class GPs (Rasmussen and Nickisch, 2010) and on the GPStuﬀtoolbox for the training of multi-class GPs, sparse GP models and the multi-output regression model (Vanhatalo et al., 2013).
Experiment Setup	Yes	We learn the GP models using a squared-exponential kernel and zero mean prior and select the hyper-parameters by means of MLE (Rasmussen and Williams, 2006). We compute adversarial robustness in neighbourhoods of the form T = [x γ, x +γ] around a given point x and for a range of γ > 0. Unless otherwise stated, we run the branch-and-bound algorithm until convergence up to an error threshold ϵ = 0.01. We vary the number of training points from 250 to 7500 and the number of inducing points (selected at random from the training points) from 100 to 500.