Incorporating Unlabelled Data into Bayesian Neural Networks
Authors: Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that the improved prior predictives of self-supervised BNNs translate to improved predictive performance, especially in problem settings with few labelled examples ( 5). In Table 2, we report the NLL for each BNN when making predictions with different numbers of labelled examples. |
| Researcher Affiliation | Academia | Mrinank Sharma EMAIL University of Oxford, UK; Tom Rainforth EMAIL University of Oxford, UK; Yee Whye Teh EMAIL University of Oxford, UK; Vincent Fortuin EMAIL Helmholtz AI, Munich, Germany Technical University of Munich, Germany |
| Pseudocode | Yes | Algorithm 1 Self-Supervised BNNs |
| Open Source Code | No | The paper references third-party libraries and their code (e.g., 'Bayesian-Torch' by Krishnan et al., 2022), but it does not provide an explicit statement or a link to the source code for the specific methodology or implementation described in this paper. |
| Open Datasets | Yes | We evaluate the performance of different BNNs on the CIFAR10 and CIFAR100 datasets, which are standard benchmarks within the BNN community. To assess out-of-distribution generalisation, we further evaluation on the CIFAR-10-C dataset (Hendrycks & Dietterich, 2018). Moreover, we evaluate whether these BNNs can detect out-of-distribution inputs from SVHN (Netzer et al., 2011) when trained on CIFAR10. |
| Dataset Splits | Yes | We evaluate the performance of different baselines when conditioning on 50, 500, 5000, and 50000 labels from the training set. For the evaluation protocols, we reserve a validation set of 1000 data points from the test set and evaluate using the remaining 9000 labels. |
| Hardware Specification | Yes | The vast majority of experiments were run on an internal compute cluster using Nvidia Tesla V100 or A100 GPUs. |
| Software Dependencies | No | The paper mentions software components such as the 'LARS optimiser', 'Adam optimiser', and the 'laplace library', but does not specify version numbers for any of these. |
| Experiment Setup | Yes | We use a N(0, 1/τp) prior over the linear parameters θt, and tune τp for each dataset. We use τp = 0.65 for CIFAR10 and τp = 0.6 for CIFAR100. We use weight decay 1e-6 for the base encoder and projection head parameters. We use the LARS optimiser (You et al., 2017), with batch size 1000 and momentum 0.9. We train for 1000 epochs, using a linear warmup cosine annealing learning rate schedule. The warmup starting learning rate for the base encoder parameters is 1e-3 with a maximum learning rate of 0.6. For the variational parameters, the maximum learning rate is 1e-3. |