Is SGD a Bayesian sampler? Well, almost
Authors: Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically investigate this bias by calculating, for a range of architectures and datasets, the probability PSGD(f|S)... Our main findings are that PSGD(f|S) correlates remarkably well with PB(f|S)... |
| Researcher Affiliation | Academia | Chris Mingard EMAIL Department of Chemistry University of Oxford Guillermo Valle-Pérez EMAIL Department of Physics University of Oxford Joar Skalse EMAIL Department of Computer Science University of Oxford Ard A. Louis EMAIL Department of Physics University of Oxford |
| Pseudocode | Yes | Algorithm 1 Calculating POPT(f|S) input: DNN N, training data S, test data E, optimiser OPT. F {the functions found during training} do n times: re-initialise the weights of N from an i.i.d. Gaussian distribution train N on S until it reaches 100 % training accuracy record the classification of N on E and save it to F A {the frequency and volume of each function } for each distinct f F do let ρf be the frequency of f in F calculate the probability POPT(f|S) = ρf/n of f in F save POPT(f|S) to A end for return A |
| Open Source Code | No | The paper does not provide an explicit statement or a link to its own source code repository for the methodology described. |
| Open Datasets | Yes | MNIST: The MNIST database of handwritten numbers (Le Cun et al., 1999) was binarised with even numbers classified as 0 and odd numbers as 1. ... Fashion-MNIST: The Fashion-MNIST database (Xiao et al., 2017) was binarised... IMDb movie review dataset: We take the IMDb movie review dataset from Keras. ... We used the version of the dataset and preprocessing technique given here: https://www.kaggle.com/ drscarlat/imdb-sentiment-analysis-keras-and-tensorflow Ionosphere Dataset: ... https://archive.ics.uci.edu/ml/datasets/Ionosphere |
| Dataset Splits | Yes | MNIST: ... Unless otherwise specified, we used |S| = 10000 and |E| = 100. Fashion-MNIST: ... Unless otherwise specified, we used |S| = 10000 and |E| = 100. IMDb movie review dataset: ... Used with |S| = 45000 and |E| = 50. Ionosphere Dataset: ... Used with |S| = 301 and |E| = 50. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | Yes | Hyperparameters are, unless otherwise specified, the default values in Keras 2.3.0. |
| Experiment Setup | Yes | For a given optimiser OPT (SGD or one of its variants), a DNN architecture, loss function (cross-entropy (CE) or mean-square error (MSE)), a training set S, and test set E, we repeat the following procedure n times: We sample initial parameters θi, from an i.i.d. truncated Gaussian distribution Ppar(θi), and train with the optimiser until the first epoch where the network has 100% training classification accuracy... We chose standard values for batch size, learning rate, etc., if given by the default values in Keras 2.3.0 (e.g. batch size of 32 and learning rate of 0.01 for SGD). |