reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Gradient Descent as Approximate Bayesian Inference

Authors: Stephan Mandt, Matthew D. Hoffman, David M. Blei

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. Experiments We test our theoretical assumptions from Section 3 and ﬁnd good experimental evidence that they are reasonable in some settings. We also investigate iterate averaging and show that the assumptions outlined in 6.2 result in samples from a close approximation to the posterior. We also compare against other approximate inference algorithms, including SGLD (Welling and Teh, 2011), NUTS (Hoffman and Gelman, 2014), and black-box variational inference (BBVI) using Gaussian reparametrization gradients (Kucukelbir et al., 2015). In Section 7.3 we show that constant SGD lets us optimize hyperparameters in a Bayesian model.
Researcher Affiliation	Collaboration	Stephan Mandt EMAIL Data Science Institute Department of Computer Science Columbia University New York, NY 10025, USA Matthew D. Hoffman EMAIL Adobe Research Adobe Systems Incorporated 601 Townsend Street San Francisco, CA 94103, USA David M. Blei EMAIL Department of Statistics Department of Computer Science Columbia University New York, NY 10025, USA
Pseudocode	Yes	Algorithm 1 The Iterate Averaging Stochastic Gradient sampler (IASG) input: averaging window T = N/S, number of samples M, input for SGD. for t = 1 to M T do θt = θt 1 ϵ ˆg S(θt 1); // perform an SGD step; if t mod T = 0 then T PT 1 t =0 θt t ; // average the T most recent iterates end end output: return samples {µ1, . . . , µM}.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	Real-world data. We ﬁrst considered the following data sets. The Wine Quality Data Set3, containing N = 4, 898 instances, 11 features, and one integer output variable (the wine rating). A data set of Protein Tertiary Structure4, containing N = 45, 730 instances, 8 features and one output variable. The Skin Segmentation Data Set5, containing N = 245, 057 instances, 3 features, and one binary output variable. ...To this end, we experimented with a Bayesian multinomial logistic (a.k.a. softmax) regression model with normal priors. ...Real-world data. In all experiments, we applied this model to the MNIST dataset (60, 000 training examples, 10, 000 test examples, 784 features) and the cover type dataset (500, 000 training examples, 81, 012 testing examples, 54 features).
Dataset Splits	Yes	Real-world data. In all experiments, we applied this model to the MNIST dataset (60, 000 training examples, 10, 000 test examples, 784 features) and the cover type dataset (500, 000 training examples, 81, 012 testing examples, 54 features).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software libraries or tools with their version numbers.
Experiment Setup	Yes	We rescaled the feature to unit length and used a mini-batch of size S = 100, S = 100 and S = 10000 for the three data sets, respectively. The quadratic regularizer was 1. The constant learning rate was adjusted according to Eq. 15. ...For IASG and SGLD we used a minibatch size of S = 10 and an averaging window of N/S = 1000. The constant learning rate of IASG was ϵ = 0.003 and for SGLD we decreased the learning rate according to the Robbins-Monro schedule of ϵt = ϵ0 1000+t where we found ϵ0 = 10 3 to be optimal.