The Deep Weight Prior
Authors: Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitriy Vetrov, Max Welling
ICLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that dwp improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from dwp accelerates training of conventional convolutional neural networks. |
| Researcher Affiliation | Collaboration | Andrei Atanov Skolkovo Institute of Science and Technology Samsung-HSE Laboratory, National Research University Higher School of Economics EMAIL Arsenii Ashukha Samsung AI Center Moscow EMAIL Kirill Struminsky Skolkovo Institute of Science and Technology National Research University Higher School of Economics EMAIL Dmitry Vetrov Samsung AI Center Moscow Samsung-HSE Laboratory, National Research University Higher School of Economics EMAIL Max Welling University of Amsterdam Canadian Institute for Advanced Research EMAIL |
| Pseudocode | Yes | Algorithm 1 Stochastic Variational Inference With Implicit Prior Distribution |
| Open Source Code | Yes | The code is available at https://github.com/bayesgroup/deep-weight-prior |
| Open Datasets | Yes | In our experiments we used MNIST (Le Cun et al., 1998), Not MNIST (Bulatov, 2011), CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009) datasets. |
| Dataset Splits | No | The paper mentions using different sizes of training sets and discusses 'test accuracy' but does not explicitly provide details about train/validation/test splits, specific percentages, or sample counts for each split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper states 'Experiments were implemented1 using Py Torch (Paszke et al., 2017).' and 'For optimization we used Adam (Kingma & Ba, 2014)'. While PyTorch is named, a specific version number is not provided, and Adam is an optimizer rather than a software dependency with a version. |
| Experiment Setup | Yes | For optimization we used Adam (Kingma & Ba, 2014) with default hyperparameters. We used a neural network with two convolutional layers with 32, 128 filters of shape 7x7, 5x5 respectively, followed by one linear layer with 10 neurons. On the CIFAR dataset we used a neural network with four convolutional layers with 128, 256, 256 filters of shape 7x7, 5x5, 5x5 respectively, followed by two fully connected layers with 512 and 10 neurons. We used a max-pooling layer (Nagi et al., 2011) After the first convolutional layer. All layers were divided with leaky ReLU nonlinearities (Nair & Hinton, 2010). We trained prior distributions on a number of source networks which were learned from different initial points on Not MNIST and CIFAR-100 datasets for MNIST and CIFAR-10 experiments respectively. Appendix F and H provide further architectural details and training parameters such as '300 epochs, Adam optimizer with linear learning rate decay from 1e-3 to 0.'. |