Incorporating Unlabelled Data into Bayesian Neural Networks

Authors: Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that the improved prior predictives of self-supervised BNNs translate to improved predictive performance, especially in problem settings with few labelled examples ( 5). In Table 2, we report the NLL for each BNN when making predictions with different numbers of labelled examples.
Researcher Affiliation Academia Mrinank Sharma EMAIL University of Oxford, UK; Tom Rainforth EMAIL University of Oxford, UK; Yee Whye Teh EMAIL University of Oxford, UK; Vincent Fortuin EMAIL Helmholtz AI, Munich, Germany Technical University of Munich, Germany
Pseudocode Yes Algorithm 1 Self-Supervised BNNs
Open Source Code No The paper references third-party libraries and their code (e.g., 'Bayesian-Torch' by Krishnan et al., 2022), but it does not provide an explicit statement or a link to the source code for the specific methodology or implementation described in this paper.
Open Datasets Yes We evaluate the performance of different BNNs on the CIFAR10 and CIFAR100 datasets, which are standard benchmarks within the BNN community. To assess out-of-distribution generalisation, we further evaluation on the CIFAR-10-C dataset (Hendrycks & Dietterich, 2018). Moreover, we evaluate whether these BNNs can detect out-of-distribution inputs from SVHN (Netzer et al., 2011) when trained on CIFAR10.
Dataset Splits Yes We evaluate the performance of different baselines when conditioning on 50, 500, 5000, and 50000 labels from the training set. For the evaluation protocols, we reserve a validation set of 1000 data points from the test set and evaluate using the remaining 9000 labels.
Hardware Specification Yes The vast majority of experiments were run on an internal compute cluster using Nvidia Tesla V100 or A100 GPUs.
Software Dependencies No The paper mentions software components such as the 'LARS optimiser', 'Adam optimiser', and the 'laplace library', but does not specify version numbers for any of these.
Experiment Setup Yes We use a N(0, 1/τp) prior over the linear parameters θt, and tune τp for each dataset. We use τp = 0.65 for CIFAR10 and τp = 0.6 for CIFAR100. We use weight decay 1e-6 for the base encoder and projection head parameters. We use the LARS optimiser (You et al., 2017), with batch size 1000 and momentum 0.9. We train for 1000 epochs, using a linear warmup cosine annealing learning rate schedule. The warmup starting learning rate for the base encoder parameters is 1e-3 with a maximum learning rate of 0.6. For the variational parameters, the maximum learning rate is 1e-3.