Monte Carlo Gradient Estimation in Machine Learning

Authors: Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation... Using a set of case studies in Section 8, we explore the behaviour of gradient estimators in different settings to make their application and design more concrete... 8.3. Empirical Comparisons... We use the UCI Women s Breast Cancer data set (Dua and Graff, 2017)... Figure 8 shows the variance of the two estimators for the gradient with respect to the log of the standard deviation log sd for four input features.
Researcher Affiliation Collaboration Shakir Mohamed 1 EMAIL Mihaela Rosca 1 2 EMAIL Michael Figurnov 1 EMAIL Andriy Mnih 1 EMAIL Equal contributions; 1 Deep Mind, London 2 University College London
Pseudocode No The paper describes various gradient estimation methods and variance reduction techniques conceptually and mathematically, but it does not include any explicitly labeled pseudocode blocks or algorithms with structured, code-like formatting.
Open Source Code Yes Reproducibility. Code to reproduce Figures 2 and 3, sets of unit tests for gradient estimation, and for the experimental case study using Bayesian logistic regression in Section 8.3 are available at https://www.github.com/deepmind/mc_gradients .
Open Datasets Yes We use the UCI Women s Breast Cancer data set (Dua and Graff, 2017), which has I = 569 data points and D = 31 features.
Dataset Splits No We use the UCI Women s Breast Cancer data set (Dua and Graff, 2017), which has I = 569 data points and D = 31 features. For evaluation, we always use the entire data set and 1000 posterior samples.
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments.
Software Dependencies No The paper mentions that code is available in a GitHub repository but does not explicitly list any software dependencies or their version numbers within the text.
Experiment Setup Yes We optimize the variational bound using stochastic gradient descent, and compare learning of the variational parameters using both the score-function and pathwise estimator... We use stochastic gradient descent for optimisation, with cosine learning rate decay (Loshchilov and Hutter, 2017) unless stated otherwise... using a batch size of B = 32 data points, a learning rate of 10 3 and estimating the gradient using N = 50 samples from the measure... Figure 10 shows a similar setup to Figure 9, with a fixed batch size B = 32, but using fewer samples from the measure, N = 5.