reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics

Authors: Yee Whye Teh, Alexandre H. Thiery, Sebastian J. Vollmer

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that, under veriﬁable assumptions, the algorithm is consistent, satisﬁes a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence (δm)m 0. We leverage this analysis to give practical recommendations for the notoriously diﬃcult tuning of this algorithm: it is asymptotically optimal to use a step-size sequence of the type δm m 1/3, leading to an algorithm whose mean squared error (MSE) decreases at rate O(m 1/3). [...] In this section we illustrate the use of the SGLD method to a simple Gaussian toy model and to a Bayesian logistic regression problem. We verify that both models satisfy Assumption 4, the main assumption needed for our asymptotic results to hold. Simulations are then performed to empirically conﬁrm our theory; for step-sizes sequences of the type δm = (m0 + m) α, both the rate of decay of the MSE and the impact of the sub-sampling scheme are investigated.
Researcher Affiliation	Academia	Yee Whye Teh EMAIL Department of Statistics University of Oxford 24-29 St Giles Oxford OX1 3LB UK; Alexandre H. Thiery EMAIL Department of Statistics and Applied Probability National University of Singapore 21 Lower Kent Ridge Road Singapore 119077; Sebastian J. Vollmer EMAIL Department of Statistics University of Oxford 24-29 St Giles Oxford OX1 3LB UK
Pseudocode	No	In summary, the SGLD algorithm can be described as follows. For a sequence of asymptotically vanishing time-steps (δm)m 0 and an initial parameter θ0 Rd, if the current position is θm 1, the next position θm is deﬁned though the recursion θm = θm 1 + 1 2δm \ log π(θm 1, Um) + δ1/2 m ηm (7) for an i.i.d. sequence ηm N(0, Id), and an independent and i.i.d. sequence Um of auxiliary random variables. This is the equivalent of the Euler-Maruyama discretization (3) of the Langevin diﬀusion (1) with a decreasing sequence of step-sizes and a stochastic estimate to the gradient term.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	In this section we illustrate the use of the SGLD method to a simple Gaussian toy model and to a Bayesian logistic regression problem. [...] We chose σθ = 1, σx = 5 and created a data set consisting of N = 100 data points simulated from the model. [...] We consider a simulated dataset where d = 3 and N = 1000.
Dataset Splits	No	The paper mentions subsample sizes for gradient estimation (e.g., "subsample sizes n = 1, 5, 10, 50, 100") and simulating data, but does not provide specific train/test/validation splits for model evaluation.
Hardware Specification	No	The paper does not contain any specific details about the hardware used for running experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies or their version numbers.
Experiment Setup	Yes	We used step sizes δm = (m + m0(α)) α, for α {0.1, 0.2, 0.3, 0.33, 0.4, 0.5} where m0(α) is chosen such that δ1 is less than the posterior standard deviation. [...] For a fair comparison we tune the MALA to an acceptance rate of approximately 0.564 following the ﬁndings of Roberts and Rosenthal (1998). For the SGLD-based variance estimate of the ﬁrst component for n = 30 we choose δm = (a m + b) 0.38 as step sizes and optimise over the choices of a and b. This is achieved by estimating the MSE for choices of a and b on a log-scale grid based on 512 independent runs. The estimates based on 20 and 1000 eﬀective iterations through the data set the averages are visualised in the heat maps in Figure 5.