reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ELBOing Stein: Variational Bayes with Stein Mixture Inference

Authors: Ola Rønning, Eric Nalisnick, Christophe Ley, Padhraic Smyth, Thomas Hamelryck

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that SMI is more particle efficient than SVGD. We use synthetic and real-world data to show that SMI does not suffer from variance collapse in smallto moderately-sized models such as small BNNs. All experiments are carried out on an NVIDIA Quadro RTX 6000 GPU. 6.1 GAUSSIAN VARIANCE ESTIMATION, 6.2 1D REGRESSION WITH SYNTHETIC DATA, 6.3 UCI REGRESSION BENCHMARK, 6.4 MNIST CLASSIFICATION. Table 2 summarizes the UCI results, evaluating their performance using root mean squared error (RMSE) and negative log-likelihood (NLL).
Researcher Affiliation	Academia	Ola Rønning Department of Computer Science University of Copenhagen EMAIL, Eric Nalisnick Department of Computer Science Johns Hopkins University EMAIL, Christophe Ley Department of Mathematics University of Luxembourg EMAIL, Padhraic Smyth Department of Computer Science University of California, Irvine EMAIL, Thomas Hamelryck Departments of Computer Science / Biology University of Copenhagen EMAIL
Pseudocode	No	The paper describes iterative optimization steps in equation (7) and (9) but does not present a clearly labeled pseudocode or algorithm block. For example: "ψt+1 ℓ = ψt ℓ+ ϵ i=1 k(ψt i,ψt ℓ) ψi L(ρt m) + α i=1 1k(ψt i,ψt ℓ) (9)"
Open Source Code	Yes	We provide an open-source implementation (under an Apache version 2 license) of SMI, called Stein VI, in the deep probabilistic programming language Num Pyro (Phan et al., 2019). All experiments use our publicly available Stein VI library, and we provide the source code for the experiments at https://github.com/aleatory-science/smi_experiments.
Open Datasets	Yes	6.3 UCI REGRESSION BENCHMARK, 6.4 MNIST CLASSIFICATION. We consider the UCI regression benchmark with Standard and Gap10 splits. applying 2 and 3 hidden-layer Bayesian Neural Networks (BNNs) to the MNIST dataset (Le Cun et al., 2010). Table 7: Summary statistics for the standard UCI benchmark datasets with train-test splits from Hernández-Lobato & Adams (2015) and Gap10 benchmark datasets adapted from Foong et al. (2019) to use 10% for testing instead of 33%.
Dataset Splits	Yes	Standard UCI uses ordinary 10% test splits (Mukhoti et al., 2018). Gap10 sorts each feature dimension to create splits (Foong et al., 2019): The middle 10% of data is used for testing, while the tails are used for training. Table 5: Evaluation interval and data size (\|D\|) of wave datasets. All data points are drawn uniformly from the evaluation interval. The Between and Entire regions contain points outside the clusters used for inference. Region Evaluation Interval \|D\| In [-1.5, 0.5] [1.3, 1.7] 20 Between [-0.5, 1.3] 60 Entire [-2, 2] 120
Hardware Specification	Yes	All experiments are carried out on an NVIDIA Quadro RTX 6000 GPU.
Software Dependencies	No	The paper mentions "Num Pyro" as the deep probabilistic programming language used for the open-source implementation, but it does not specify a version number for Num Pyro or any other key software components used in the experiments.
Experiment Setup	Yes	Optimization is performed using the Adam optimizer for SVGD and ASVGD and Adagrad for SMI, each with a learning rate of 0.05. We run the optimization for 60,000 steps, sufficient for all three methods to achieve convergence. We use the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001. We run SVGD, ASVGD, and SMI with five particles for 15,000 steps and OVI for 50,000 steps, sufficient for converging. We employ the Adam optimizer with a learning rate of 10-3 for both MAP and OVI. For SVGD, SMI) and ASVGD, we use the Adagrad optimizer with a learning rate of 0.7 for the 2 hidden-layer BNN and 0.8 for the 3 hidden-layer BNN, utilizing five particles in each case. All approaches are trained for 100 epochs with a batch size of 128. We choose the learning rate from [5 * 10^i] for i=1 to 6 with a grid search on the first split of each data set.