Model Selection in Bayesian Neural Networks via Horseshoe Priors

Authors: Soumya Ghosh, Jiayu Yao, Finale Doshi-Velez

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we carefully evaluate our proposed modeling and inferential contributions. First, on synthetic data we explore the properties of the horseshoe BNN (HS-BNN) and its different parameterizations. We find that employing a non-centered parameterization is necessary for effective model selection. Next, on several real-world data sets we vet both the model selection effects and the predictive performance of the non-centered HS-BNN and find that it is able to provide model selection benefits without significantly sacrificing predictive performance.
Researcher Affiliation Collaboration Soumya Ghosh EMAIL MIT-IBM Watson AI Lab and Center for Computational Health, IBM Research, Cambridge, MA 02142. Jiayu Yao EMAIL Finale Doshi-Velez EMAIL School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138.
Pseudocode Yes Algorithm 1 Training with Standard and Regularized HS-BNNs 1: Input Model p(D, θ), variational approximation q(θ | φ), number of iterations T. 2: Output: Variational parameters φ 3: Initialize variational parameters φ. 4: for T iterations do 5: Update φc, φκ, φγ, {φBl}l, {φυl}l ADAM(L(φ)). 6: for all hidden layers l do 7: Conditioned on φBl, φυl update φϑl, φλkl using fixed point updates (Equation 9). 9: Conditioned on φκ update φρκ via the corresponding fixed point update. 10: end for
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We preprocessed the images in the MNIST digits data set by dividing the pixel values with 126. ... We also experimented with a gesture recognition data set (Song et al., 2011) that consists of 24 unique aircraft handling signals performed by 20 different subjects... We next apply our HS-BNN to various standard UCI regression data sets.
Dataset Splits Yes The error rates reported are a result of averaging over five random 75/25 splits of the data. ... For the smaller data sets we train on a randomly subsampled 90% subset and evaluate on the remainder and repeat this process 20 times. For Protein we perform five replications and for Year we evaluate on a single split. ... For all but the year data set, we report results from 5 trials each trained on a random 90/10 split of the data.
Hardware Specification Yes Experiments were performed on a 2.5 GHz Intel Core i7, with 16GB of RAM.
Software Dependencies No The paper mentions using Adam (Kingma and Ba, 2015) for optimization, but does not specify any software libraries or packages with version numbers used for implementation.
Experiment Setup Yes In our experiments, unless otherwise mentioned we use a learning rate of 0.005. For regression problems we employ a Gaussian likelihood with an unknown precision γ, p(yn|f(W, xn), γ) = N(yn|f(W, xn), γ 1). We place a vague prior on the precision, γ Gamma(6, 6) and approximate the posterior over γ using another Gamma distribution. For classification problems we use a Categorical distribution parameterized by S(f(W, xn)), where S is the softmax transformation. ... For HS-BNN we used Adam with a learning rate of 0.005 and 500 epochs. ... All experiments used a batch size of 128, and bg = 10 5. ... We used learning rate 0.0002, and batch size 32, and trained the network for 2000 episodes.