Do Bayesian Neural Networks Actually Behave Like Bayesian Models?

Authors: Gábor Pituk, Vik Shirvaikar, Tom Rainforth

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically investigate how well popular approximate inference algorithms for Bayesian Neural Networks (BNNs) respect the theoretical properties of Bayesian belief updating. We find strong evidence on synthetic regression and real-world image classification tasks that common BNN algorithms such as variational inference, Laplace approximation, SWAG, and SGLD fail to update in a consistent manner, forget about old data under sequential updates, and violate the predictive coherence properties that would be expected of Bayesian methods. These observed behaviors imply that care should be taken when treating BNNs as true Bayesian models, particularly when using them beyond static prediction settings, such as for active, continual, or transfer learning.
Researcher Affiliation Academia Gábor Pituk 1 Vik Shirvaikar 1 Tom Rainforth 1 1Department of Statistics, University of Oxford, Oxford, UK.
Pseudocode No The paper describes algorithms such as Hamiltonian Monte Carlo, Variational Inference, Laplace Approximation, SWAG, and SGLD in Section B, but these are explained in descriptive text and do not appear as structured pseudocode blocks or clearly labeled algorithm sections.
Open Source Code Yes We make our fork of their codebase available at github.com/pitukg/bnn_seq_vi/tree/master/bnn_hmc.
Open Datasets Yes We find on synthetic regression tasks and the CIFAR and IMDB image and text classification settings of Izmailov et al. (2021b) that BNNs fail to preserve key features of Bayesian inference.
Dataset Splits Yes We partition our synthetic regression dataset into N = 5 equal groups based on the x value, and run sequential approximate inference. We use the CIFAR-10 dataset for this experiment, and consider taking two random subsets of 4080 images each: a labeled split (x, y), and an unlabeled split x . We randomly split the training sets into two splits D(1) and D(2).
Hardware Specification No The paper does not explicitly state the specific hardware (e.g., GPU/CPU models, memory details) used for its own experiments. It mentions other researchers' use of "hundreds of Tensor Processor Units" when discussing HMC, but not for the experiments conducted in this paper.
Software Dependencies Yes To carry out our experiments we use Num Pyro (Phan et al., 2019; Bingham et al., 2019), a probabilistic programming library in Python built on JAX (Bradbury et al., 2018). Our SWAG implementation relies on the Optax SWAG library (activatedgeek, 2023).
Experiment Setup Yes We follow the hyper-parameters from Table 4 of Izmailov et al. (2021b). We use two fully connected BNN architectures with hidden layers of size 32, 32, 16, and 128, 256, 128, 64, respectively. We pick β = 0.325 for the small network and β = 0.05 for the larger network... We pick λ = 300 for the small network and λ = 2000 for the large network... ...η = 0.02 for the small network and η = 0.01 for the larger network.