ELBOing Stein: Variational Bayes with Stein Mixture Inference

Authors: Ola Rønning, Eric Nalisnick, Christophe Ley, Padhraic Smyth, Thomas Hamelryck

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that SMI is more particle efficient than SVGD. We use synthetic and real-world data to show that SMI does not suffer from variance collapse in smallto moderately-sized models such as small BNNs. All experiments are carried out on an NVIDIA Quadro RTX 6000 GPU. 6.1 GAUSSIAN VARIANCE ESTIMATION, 6.2 1D REGRESSION WITH SYNTHETIC DATA, 6.3 UCI REGRESSION BENCHMARK, 6.4 MNIST CLASSIFICATION. Table 2 summarizes the UCI results, evaluating their performance using root mean squared error (RMSE) and negative log-likelihood (NLL).
Researcher Affiliation Academia Ola Rønning Department of Computer Science University of Copenhagen EMAIL, Eric Nalisnick Department of Computer Science Johns Hopkins University EMAIL, Christophe Ley Department of Mathematics University of Luxembourg EMAIL, Padhraic Smyth Department of Computer Science University of California, Irvine EMAIL, Thomas Hamelryck Departments of Computer Science / Biology University of Copenhagen EMAIL
Pseudocode No The paper describes iterative optimization steps in equation (7) and (9) but does not present a clearly labeled pseudocode or algorithm block. For example: "ψt+1 ℓ = ψt ℓ+ ϵ i=1 k(ψt i,ψt ℓ) ψi L(ρt m) + α i=1 1k(ψt i,ψt ℓ) (9)"
Open Source Code Yes We provide an open-source implementation (under an Apache version 2 license) of SMI, called Stein VI, in the deep probabilistic programming language Num Pyro (Phan et al., 2019). All experiments use our publicly available Stein VI library, and we provide the source code for the experiments at https://github.com/aleatory-science/smi_experiments.
Open Datasets Yes 6.3 UCI REGRESSION BENCHMARK, 6.4 MNIST CLASSIFICATION. We consider the UCI regression benchmark with Standard and Gap10 splits. applying 2 and 3 hidden-layer Bayesian Neural Networks (BNNs) to the MNIST dataset (Le Cun et al., 2010). Table 7: Summary statistics for the standard UCI benchmark datasets with train-test splits from Hernández-Lobato & Adams (2015) and Gap10 benchmark datasets adapted from Foong et al. (2019) to use 10% for testing instead of 33%.
Dataset Splits Yes Standard UCI uses ordinary 10% test splits (Mukhoti et al., 2018). Gap10 sorts each feature dimension to create splits (Foong et al., 2019): The middle 10% of data is used for testing, while the tails are used for training. Table 5: Evaluation interval and data size (|D|) of wave datasets. All data points are drawn uniformly from the evaluation interval. The Between and Entire regions contain points outside the clusters used for inference. Region Evaluation Interval |D| In [-1.5, 0.5] [1.3, 1.7] 20 Between [-0.5, 1.3] 60 Entire [-2, 2] 120
Hardware Specification Yes All experiments are carried out on an NVIDIA Quadro RTX 6000 GPU.
Software Dependencies No The paper mentions "Num Pyro" as the deep probabilistic programming language used for the open-source implementation, but it does not specify a version number for Num Pyro or any other key software components used in the experiments.
Experiment Setup Yes Optimization is performed using the Adam optimizer for SVGD and ASVGD and Adagrad for SMI, each with a learning rate of 0.05. We run the optimization for 60,000 steps, sufficient for all three methods to achieve convergence. We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001. We run SVGD, ASVGD, and SMI with five particles for 15,000 steps and OVI for 50,000 steps, sufficient for converging. We employ the Adam optimizer with a learning rate of 10-3 for both MAP and OVI. For SVGD, SMI) and ASVGD, we use the Adagrad optimizer with a learning rate of 0.7 for the 2 hidden-layer BNN and 0.8 for the 3 hidden-layer BNN, utilizing five particles in each case. All approaches are trained for 100 epochs with a batch size of 128. We choose the learning rate from [5 * 10^i] for i=1 to 6 with a grid search on the first split of each data set.