Dual Parameterization of Sparse Variational Gaussian Processes

Authors: Vincent ADAM, Paul Chang, Mohammad Emtiyaz Khan, Arno Solin

NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Empirical Evaluation We conduct experiments to highlight the advantages of using the dual parameterization. Firstly, we study the effects of the improved objective for hyperparameter learning of t-SVGP versus q-SVGP. We study the objective being optimized for a single M-step, after an E-step ran until convergence. We then show a full sequence of EM iterations on small data sets. For large-scale data, where running steps to convergence is expensive, we use partial E and M-steps and mini-batching. Our improved bound and faster natural gradient computations show benefits in both settings.
Researcher Affiliation Collaboration Vincent Adam Aalto University / Secondmind.ai Espoo, Finland / Cambridge, UK EMAIL Paul E. Chang Aalto University Espoo, Finland EMAIL Mohammad Emtiyaz Khan RIKEN Center for AI Project Tokyo, Japan EMAIL Arno Solin Aalto University Espoo, Finland EMAIL
Pseudocode Yes The full algorithm is given in App. E.
Open Source Code Yes We provide a reference implementation of our method under the GPflow framework at https: //github.com/Aalto ML/t-SVGP.
Open Datasets Yes MNIST ([23], available under CC BY-SA 3.0), We use common small and mid-sized UCI data sets to test the performance of our method
Dataset Splits Yes We perform 5-fold cross validation with the results in Fig. 3 showing the mean of the folds for ELBO and NLPD.
Hardware Specification Yes We compare wall-clock time to compute 150 steps of the algorithm for both methods in terms of NLPD and ELBO taking single E and M-steps (Mac Book pro, 2 GHz CPU, 16 GB RAM).
Software Dependencies Yes We compare against the state-of-the-art implementation of SVGP in GPflow ([26], v2.2.1)
Experiment Setup Yes All experiments are performed with a batch size of nb = 200 and m = 100 inducing points and the optimization is ran until convergence using the Adam optimizer for the hyperparameters (M-step)., Table 1: NLPD on MNIST benchmarks for different learning rates and E and M steps.