Monte Carlo Gradient Estimation in Machine Learning
Authors: Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation... Using a set of case studies in Section 8, we explore the behaviour of gradient estimators in different settings to make their application and design more concrete... 8.3. Empirical Comparisons... We use the UCI Women s Breast Cancer data set (Dua and Graff, 2017)... Figure 8 shows the variance of the two estimators for the gradient with respect to the log of the standard deviation log sd for four input features. |
| Researcher Affiliation | Collaboration | Shakir Mohamed 1 EMAIL Mihaela Rosca 1 2 EMAIL Michael Figurnov 1 EMAIL Andriy Mnih 1 EMAIL Equal contributions; 1 Deep Mind, London 2 University College London |
| Pseudocode | No | The paper describes various gradient estimation methods and variance reduction techniques conceptually and mathematically, but it does not include any explicitly labeled pseudocode blocks or algorithms with structured, code-like formatting. |
| Open Source Code | Yes | Reproducibility. Code to reproduce Figures 2 and 3, sets of unit tests for gradient estimation, and for the experimental case study using Bayesian logistic regression in Section 8.3 are available at https://www.github.com/deepmind/mc_gradients . |
| Open Datasets | Yes | We use the UCI Women s Breast Cancer data set (Dua and Graff, 2017), which has I = 569 data points and D = 31 features. |
| Dataset Splits | No | We use the UCI Women s Breast Cancer data set (Dua and Graff, 2017), which has I = 569 data points and D = 31 features. For evaluation, we always use the entire data set and 1000 posterior samples. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions that code is available in a GitHub repository but does not explicitly list any software dependencies or their version numbers within the text. |
| Experiment Setup | Yes | We optimize the variational bound using stochastic gradient descent, and compare learning of the variational parameters using both the score-function and pathwise estimator... We use stochastic gradient descent for optimisation, with cosine learning rate decay (Loshchilov and Hutter, 2017) unless stated otherwise... using a batch size of B = 32 data points, a learning rate of 10 3 and estimating the gradient using N = 50 samples from the measure... Figure 10 shows a similar setup to Figure 9, with a fixed batch size B = 32, but using fewer samples from the measure, N = 5. |