Greedy Bayesian Posterior Approximation with Deep Ensembles

Authors: Aleksei Tiulpin, Matthew B. Blaschko

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The performance of our approach is demonstrated on computer vision out-of-distribution detection benchmarks in a range of architectures trained on multiple datasets.
Researcher Affiliation Academia Aleksei Tiulpin aleksei.tiulpin@oulu.fi Research Unit of Medical Imaging, Physics and Technology Faculty of Medicine, University of Oulu, Finland Matthew B. Blaschko EMAIL Center for Processing Speech and Images Department of Electrical Engineering KU Leuven, Belgium
Pseudocode Yes Algorithm 1 Random Greedy algorithm Algorithm 2 O(k) Random Greedy-based algorithm for training ensembles of neural networks.
Open Source Code Yes The source code of our method is made publicly available at https://github.com/Oulu-IMEDS/greedy_ensembles_training.
Open Datasets Yes We ran our main experiments on CIFAR10, CIFAR100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) in-distribution datasets. Our OOD detection benchmark included CIFAR10, CIFAR100, DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), LSUN (Yu et al., 2015), Tiny Image Net (Le & Yang, 2015), Places 365 (Zhou et al., 2017), Bernoulli noise images, Gaussian noise, random blobs image, and uniform noise images. [...] In addition to the CIFAR and SVHN experiments, we used MNIST (Le Cun et al., 1998) with Res Net8.
Dataset Splits Yes We used validation set accuracy (10% of the training data; randomly chosen stratified split) to select the models when optimizing the marginal gain. The best snapshot was found using the validation data, was then selected for final testing. When selecting the models for evaluation on OOD data, we first evaluated ensembles on the in-distribution test set (Appendix C.2).
Hardware Specification Yes All our models in the ensembles were trained for 100 epochs using Py Torch (Paszke et al., 2019), each ensemble on a single NVIDIA V100 GPU.
Software Dependencies No All our models in the ensembles were trained for 100 epochs using Py Torch (Paszke et al., 2019) [...] For the synthetic data experiments, we used scikit-learn (Pedregosa et al., 2011)
Experiment Setup Yes The main training hyper-parameters were adapted from (Maddox et al., 2019) (see Table C2), but with additional modifications inspired by (Malinin & Gales, 2018; Smith & Topin, 2019), which helped to train the CIFAR models to state-of-the-art performance in only 100 epochs. As such, we first employed a warm-up of the learning rate (LR) from a value 10 times lower than the initial LR (LRinit in Table C2) for 5 epochs. Subsequently, after 50% of the training budget, we linearly annealed the LR to the value of LR lrscale until 90% of the training budget is reached, after which we kept the value of LR constant. All models were trained using stochastic gradient descent with momentum of 0.9 and a total batch size of 128. We employed standard training augmentations horizontal flipping, reflective padding to 34x34, and random crop to 34x34 pixels.