Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Authors: Vu Nguyen, Vaden Masrani, Rob Brekelmans, Michael Osborne, Frank Wood
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks. |
| Researcher Affiliation | Academia | Vu Nguyen University of Oxford EMAIL Vaden Masrani University of British Columbia EMAIL Rob Brekelmans USC Information Sciences Institute EMAIL Michael A. Osborne University of Oxford EMAIL Frank Wood University of British Columbia EMAIL |
| Pseudocode | Yes | Algorithm 1 GP-bandit for TVO (high level) |
| Open Source Code | Yes | Our code is available at http://github.com/ntienvu/tvo_gp_bandit. |
| Open Datasets | Yes | We demonstrate the effectiveness of our method for training VAEs [17] on MNIST and Fashion MNIST, and a Sigmoid Belief Network [27] on binarized MNIST and binarized Omniglot, using the TVO objective. |
| Dataset Splits | No | The paper mentions evaluating on 'test log evidence' and 'test KL divergence' but does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproduction. |
| Hardware Specification | No | The paper mentions using computational resources from 'West Grid' and 'Compute Canada' in the acknowledgments, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'), needed to replicate the experiment. |
| Experiment Setup | Yes | All continuous VAEs use a two-layer encoder and decoder with 200 hidden units per layer, and a 20-dimensional latent space. All experiments use Adam [16] optimizer with a learning rate of 1e-3, with gradient clipping at 100. We evaluate our GPbandit for S {10, 50} and d {2, 5, 10, 15} and, for each configuration, train until convergence using 5 random seeds. We set the update frequency w = 6 initially and increment w by one after every 10 bandit iterations to account for smaller objective changes later in training, and update early if Lt 0.05. We found that selecting βj too close to either 0 or 1 could negatively affect performance, and thus restrict β [0.05, 0.95]d in all experiments. |