reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Optimization-centric View on Bayes' Rule: Reviewing and Generalizing Variational Inference

Authors: Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore applications of GVI posteriors, and show that they can be used to improve robustness and posterior marginals on Bayesian Neural Networks and Deep Gaussian Processes. ... Section 6: We demonstrate GVI on two large-scale inference applications: Bayesian Neural Networks (BNNs) and Deep Gaussian Processes (DGPs). ... The results are depicted in Figure 13 and conﬁrm our two main intuitions about robustness: Firstly, the robust scoring rule provides a signiﬁcant performance improvement. Secondly, the smaller value of γ (which will be closer to the log score) generally outperforms the larger value of γ, though both choices are equally good in many data sets13.
Researcher Affiliation	Academia	Jeremias Knoblauch EMAIL The Alan Turing Institute Dept. of Statistics University of Warwick Coventry, CV4 7AL, UK Jack Jewson EMAIL The Alan Turing Institute Dept. of Statistics University of Warwick Coventry, CV4 7AL, UK Theodoros Damoulas EMAIL The Alan Turing Institute Depts. of Computer Science & Statistics University of Warwick Coventry, CV4 7AL, UK
Pseudocode	Yes	Algorithm 1 Black box GVI (BBGVI) Input: x1:n, π, D, ℓ, Q, h, Stopping Criterion, κ0, K, S, t = 0, Learning Rate done False while not done do // STEP 1: Get a subsample from x1:n of size K ρ1:K Sample Without Replacement(1 : n, K) x(t)1:K xρ1:K // STEP 2: Sample from q(θ\|κt) and compute losses θ(1:S) i.i.d. q(θ\|κt) ℓi,s ℓ(θ(s), x(t)i) κt log q(θ(s)\|κt) for all s = 1, 2, . . . S and i = 1, 2, . . . , K ℓs n K PK i=1 ℓi,s for all s = 1, 2, . . . S // STEP 3: Compute divergence term if D(q π) admits closed form then ℓs ℓs + κD(q π) for all s = 1, 2, . . . S else if D(q π) = Eq[ℓD κ,π(θ)] then ℓs ℓs + ℓD κ,π(θ(s)) κt log q(θ(s)\|κt) + κtℓD κt,π(θ(s)) for all s = 1, 2, . . . S else if D(q π) = τ Eq[ℓD κ,π(θ)] then ℓs ℓs + τ 1 S PS s=1 ℓD κ,π(θ(s)) κtℓD κt,π(θ(s)).
Open Source Code	Yes	All code used for generating the experiments is available from https://github.com/Jeremias Knoblauch/GVIPublic.
Open Datasets	Yes	We use the same settings, meaning that all experiments use 20,000 iterations of the ADAM optimizer (Kingma and Ba, 2014) with a learning rate of 0.01 and default settings for all other hyperparameters. We perform inference for each of the UCI data sets (Lichman, 2013) after normalization using the RBF kernel with dimension-wise lengthscales, 100 inducing points, with batch sizes of min(1000, n) and Dl = min(Dx, 30).
Dataset Splits	Yes	Using 50 random splits of the relevant data into training (90%) and test (10%) sets, the inferred models are evaluated predictively on the test sets using the average negative log likelihood (NLL) as well as the average root mean square error (RMSE). ... As before, we use 50 random splits with 90% training and 10% test data to assess predictive performance in terms of negative log likelihood (NLL) and root mean square error (RMSE).
Hardware Specification	No	No specific hardware details (like GPU models, CPU models, or cloud instance types) were explicitly mentioned for running the experiments.
Software Dependencies	No	Our implementation is built on top of that used for the results of Li and Turner (2016) and only changes the objective being optimized. Similarly, all settings and data sets for which the methods are compared are unchanged and taken directly from Li and Turner (2016) and Hern andez-Lobato et al. (2016): We use a single-layer network with 50 Re LU nodes on all experiments. Inference is performed via probabilistic back-propagation (Hern andez-Lobato and Adams, 2015) and the ADAM optimizer (Kingma and Ba, 2014) with its default settings, 500 epochs and a batch size of 32. ... As with the experiments on BNNs in the previous section, we make comparisons as fair as possible by using the gpflow (Matthews et al., 2017) implementation of Salimbeni and Deisenroth (2017).
Experiment Setup	Yes	We use a single-layer network with 50 Re LU nodes on all experiments. Inference is performed via probabilistic back-propagation (Hern andez-Lobato and Adams, 2015) and the ADAM optimizer (Kingma and Ba, 2014) with its default settings, 500 epochs and a batch size of 32. ... Further, we use the same settings, meaning that all experiments use 20,000 iterations of the ADAM optimizer (Kingma and Ba, 2014) with a learning rate of 0.01 and default settings for all other hyperparameters. We perform inference for each of the UCI data sets (Lichman, 2013) after normalization using the RBF kernel with dimension-wise lengthscales, 100 inducing points, with batch sizes of min(1000, n) and Dl = min(Dx, 30).