Is SGD a Bayesian sampler? Well, almost

Authors: Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically investigate this bias by calculating, for a range of architectures and datasets, the probability PSGD(f|S)... Our main findings are that PSGD(f|S) correlates remarkably well with PB(f|S)...
Researcher Affiliation Academia Chris Mingard EMAIL Department of Chemistry University of Oxford Guillermo Valle-Pérez EMAIL Department of Physics University of Oxford Joar Skalse EMAIL Department of Computer Science University of Oxford Ard A. Louis EMAIL Department of Physics University of Oxford
Pseudocode Yes Algorithm 1 Calculating POPT(f|S) input: DNN N, training data S, test data E, optimiser OPT. F {the functions found during training} do n times: re-initialise the weights of N from an i.i.d. Gaussian distribution train N on S until it reaches 100 % training accuracy record the classification of N on E and save it to F A {the frequency and volume of each function } for each distinct f F do let ρf be the frequency of f in F calculate the probability POPT(f|S) = ρf/n of f in F save POPT(f|S) to A end for return A
Open Source Code No The paper does not provide an explicit statement or a link to its own source code repository for the methodology described.
Open Datasets Yes MNIST: The MNIST database of handwritten numbers (Le Cun et al., 1999) was binarised with even numbers classified as 0 and odd numbers as 1. ... Fashion-MNIST: The Fashion-MNIST database (Xiao et al., 2017) was binarised... IMDb movie review dataset: We take the IMDb movie review dataset from Keras. ... We used the version of the dataset and preprocessing technique given here: https://www.kaggle.com/ drscarlat/imdb-sentiment-analysis-keras-and-tensorflow Ionosphere Dataset: ... https://archive.ics.uci.edu/ml/datasets/Ionosphere
Dataset Splits Yes MNIST: ... Unless otherwise specified, we used |S| = 10000 and |E| = 100. Fashion-MNIST: ... Unless otherwise specified, we used |S| = 10000 and |E| = 100. IMDb movie review dataset: ... Used with |S| = 45000 and |E| = 50. Ionosphere Dataset: ... Used with |S| = 301 and |E| = 50.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies Yes Hyperparameters are, unless otherwise specified, the default values in Keras 2.3.0.
Experiment Setup Yes For a given optimiser OPT (SGD or one of its variants), a DNN architecture, loss function (cross-entropy (CE) or mean-square error (MSE)), a training set S, and test set E, we repeat the following procedure n times: We sample initial parameters θi, from an i.i.d. truncated Gaussian distribution Ppar(θi), and train with the optimiser until the first epoch where the network has 100% training classification accuracy... We chose standard values for batch size, learning rate, etc., if given by the default values in Keras 2.3.0 (e.g. batch size of 32 and learning rate of 0.01 for SGD).