Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

Authors: Yee Whye Teh, Alexandre H. Thiery, Sebastian J. Vollmer

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that, under verifiable assumptions, the algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m 0. We leverage this analysis to give practical recommendations for the notoriously difficult tuning of this algorithm: it is asymptotically optimal to use a step-size sequence of the type δm m 1/3, leading to an algorithm whose mean squared error (MSE) decreases at rate O(m 1/3). [...] In this section we illustrate the use of the SGLD method to a simple Gaussian toy model and to a Bayesian logistic regression problem. We verify that both models satisfy Assumption 4, the main assumption needed for our asymptotic results to hold. Simulations are then performed to empirically confirm our theory; for step-sizes sequences of the type δm = (m0 + m) α, both the rate of decay of the MSE and the impact of the sub-sampling scheme are investigated.
Researcher Affiliation Academia Yee Whye Teh EMAIL Department of Statistics University of Oxford 24-29 St Giles Oxford OX1 3LB UK; Alexandre H. Thiery EMAIL Department of Statistics and Applied Probability National University of Singapore 21 Lower Kent Ridge Road Singapore 119077; Sebastian J. Vollmer EMAIL Department of Statistics University of Oxford 24-29 St Giles Oxford OX1 3LB UK
Pseudocode No In summary, the SGLD algorithm can be described as follows. For a sequence of asymptotically vanishing time-steps (δm)m 0 and an initial parameter θ0 Rd, if the current position is θm 1, the next position θm is defined though the recursion θm = θm 1 + 1 2δm \ log π(θm 1, Um) + δ1/2 m ηm (7) for an i.i.d. sequence ηm N(0, Id), and an independent and i.i.d. sequence Um of auxiliary random variables. This is the equivalent of the Euler-Maruyama discretization (3) of the Langevin diffusion (1) with a decreasing sequence of step-sizes and a stochastic estimate to the gradient term.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets No In this section we illustrate the use of the SGLD method to a simple Gaussian toy model and to a Bayesian logistic regression problem. [...] We chose σθ = 1, σx = 5 and created a data set consisting of N = 100 data points simulated from the model. [...] We consider a simulated dataset where d = 3 and N = 1000.
Dataset Splits No The paper mentions subsample sizes for gradient estimation (e.g., "subsample sizes n = 1, 5, 10, 50, 100") and simulating data, but does not provide specific train/test/validation splits for model evaluation.
Hardware Specification No The paper does not contain any specific details about the hardware used for running experiments.
Software Dependencies No The paper does not mention any specific software dependencies or their version numbers.
Experiment Setup Yes We used step sizes δm = (m + m0(α)) α, for α {0.1, 0.2, 0.3, 0.33, 0.4, 0.5} where m0(α) is chosen such that δ1 is less than the posterior standard deviation. [...] For a fair comparison we tune the MALA to an acceptance rate of approximately 0.564 following the findings of Roberts and Rosenthal (1998). For the SGLD-based variance estimate of the first component for n = 30 we choose δm = (a m + b) 0.38 as step sizes and optimise over the choices of a and b. This is achieved by estimating the MSE for choices of a and b on a log-scale grid based on 512 independent runs. The estimates based on 20 and 1000 effective iterations through the data set the averages are visualised in the heat maps in Figure 5.