reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Is SGD a Bayesian sampler? Well, almost

Authors: Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically investigate this bias by calculating, for a range of architectures and datasets, the probability PSGD(f\|S)... Our main ﬁndings are that PSGD(f\|S) correlates remarkably well with PB(f\|S)...
Researcher Affiliation	Academia	Chris Mingard EMAIL Department of Chemistry University of Oxford Guillermo Valle-Pérez EMAIL Department of Physics University of Oxford Joar Skalse EMAIL Department of Computer Science University of Oxford Ard A. Louis EMAIL Department of Physics University of Oxford
Pseudocode	Yes	Algorithm 1 Calculating POPT(f\|S) input: DNN N, training data S, test data E, optimiser OPT. F {the functions found during training} do n times: re-initialise the weights of N from an i.i.d. Gaussian distribution train N on S until it reaches 100 % training accuracy record the classiﬁcation of N on E and save it to F A {the frequency and volume of each function } for each distinct f F do let ρf be the frequency of f in F calculate the probability POPT(f\|S) = ρf/n of f in F save POPT(f\|S) to A end for return A
Open Source Code	No	The paper does not provide an explicit statement or a link to its own source code repository for the methodology described.
Open Datasets	Yes	MNIST: The MNIST database of handwritten numbers (Le Cun et al., 1999) was binarised with even numbers classiﬁed as 0 and odd numbers as 1. ... Fashion-MNIST: The Fashion-MNIST database (Xiao et al., 2017) was binarised... IMDb movie review dataset: We take the IMDb movie review dataset from Keras. ... We used the version of the dataset and preprocessing technique given here: https://www.kaggle.com/ drscarlat/imdb-sentiment-analysis-keras-and-tensorflow Ionosphere Dataset: ... https://archive.ics.uci.edu/ml/datasets/Ionosphere
Dataset Splits	Yes	MNIST: ... Unless otherwise speciﬁed, we used \|S\| = 10000 and \|E\| = 100. Fashion-MNIST: ... Unless otherwise speciﬁed, we used \|S\| = 10000 and \|E\| = 100. IMDb movie review dataset: ... Used with \|S\| = 45000 and \|E\| = 50. Ionosphere Dataset: ... Used with \|S\| = 301 and \|E\| = 50.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	Yes	Hyperparameters are, unless otherwise speciﬁed, the default values in Keras 2.3.0.
Experiment Setup	Yes	For a given optimiser OPT (SGD or one of its variants), a DNN architecture, loss function (cross-entropy (CE) or mean-square error (MSE)), a training set S, and test set E, we repeat the following procedure n times: We sample initial parameters θi, from an i.i.d. truncated Gaussian distribution Ppar(θi), and train with the optimiser until the ﬁrst epoch where the network has 100% training classiﬁcation accuracy... We chose standard values for batch size, learning rate, etc., if given by the default values in Keras 2.3.0 (e.g. batch size of 32 and learning rate of 0.01 for SGD).