Greedy Bayesian Posterior Approximation with Deep Ensembles
Authors: Aleksei Tiulpin, Matthew B. Blaschko
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of our approach is demonstrated on computer vision out-of-distribution detection benchmarks in a range of architectures trained on multiple datasets. |
| Researcher Affiliation | Academia | Aleksei Tiulpin aleksei.tiulpin@oulu.fi Research Unit of Medical Imaging, Physics and Technology Faculty of Medicine, University of Oulu, Finland Matthew B. Blaschko EMAIL Center for Processing Speech and Images Department of Electrical Engineering KU Leuven, Belgium |
| Pseudocode | Yes | Algorithm 1 Random Greedy algorithm Algorithm 2 O(k) Random Greedy-based algorithm for training ensembles of neural networks. |
| Open Source Code | Yes | The source code of our method is made publicly available at https://github.com/Oulu-IMEDS/greedy_ensembles_training. |
| Open Datasets | Yes | We ran our main experiments on CIFAR10, CIFAR100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) in-distribution datasets. Our OOD detection benchmark included CIFAR10, CIFAR100, DTD (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), LSUN (Yu et al., 2015), Tiny Image Net (Le & Yang, 2015), Places 365 (Zhou et al., 2017), Bernoulli noise images, Gaussian noise, random blobs image, and uniform noise images. [...] In addition to the CIFAR and SVHN experiments, we used MNIST (Le Cun et al., 1998) with Res Net8. |
| Dataset Splits | Yes | We used validation set accuracy (10% of the training data; randomly chosen stratified split) to select the models when optimizing the marginal gain. The best snapshot was found using the validation data, was then selected for final testing. When selecting the models for evaluation on OOD data, we first evaluated ensembles on the in-distribution test set (Appendix C.2). |
| Hardware Specification | Yes | All our models in the ensembles were trained for 100 epochs using Py Torch (Paszke et al., 2019), each ensemble on a single NVIDIA V100 GPU. |
| Software Dependencies | No | All our models in the ensembles were trained for 100 epochs using Py Torch (Paszke et al., 2019) [...] For the synthetic data experiments, we used scikit-learn (Pedregosa et al., 2011) |
| Experiment Setup | Yes | The main training hyper-parameters were adapted from (Maddox et al., 2019) (see Table C2), but with additional modifications inspired by (Malinin & Gales, 2018; Smith & Topin, 2019), which helped to train the CIFAR models to state-of-the-art performance in only 100 epochs. As such, we first employed a warm-up of the learning rate (LR) from a value 10 times lower than the initial LR (LRinit in Table C2) for 5 epochs. Subsequently, after 50% of the training budget, we linearly annealed the LR to the value of LR lrscale until 90% of the training budget is reached, after which we kept the value of LR constant. All models were trained using stochastic gradient descent with momentum of 0.9 and a total batch size of 128. We employed standard training augmentations horizontal flipping, reflective padding to 34x34, and random crop to 34x34 pixels. |