Model Selection in Bayesian Neural Networks via Horseshoe Priors
Authors: Soumya Ghosh, Jiayu Yao, Finale Doshi-Velez
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we carefully evaluate our proposed modeling and inferential contributions. First, on synthetic data we explore the properties of the horseshoe BNN (HS-BNN) and its different parameterizations. We find that employing a non-centered parameterization is necessary for effective model selection. Next, on several real-world data sets we vet both the model selection effects and the predictive performance of the non-centered HS-BNN and find that it is able to provide model selection benefits without significantly sacrificing predictive performance. |
| Researcher Affiliation | Collaboration | Soumya Ghosh EMAIL MIT-IBM Watson AI Lab and Center for Computational Health, IBM Research, Cambridge, MA 02142. Jiayu Yao EMAIL Finale Doshi-Velez EMAIL School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138. |
| Pseudocode | Yes | Algorithm 1 Training with Standard and Regularized HS-BNNs 1: Input Model p(D, θ), variational approximation q(θ | φ), number of iterations T. 2: Output: Variational parameters φ 3: Initialize variational parameters φ. 4: for T iterations do 5: Update φc, φκ, φγ, {φBl}l, {φυl}l ADAM(L(φ)). 6: for all hidden layers l do 7: Conditioned on φBl, φυl update φϑl, φλkl using fixed point updates (Equation 9). 9: Conditioned on φκ update φρκ via the corresponding fixed point update. 10: end for |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We preprocessed the images in the MNIST digits data set by dividing the pixel values with 126. ... We also experimented with a gesture recognition data set (Song et al., 2011) that consists of 24 unique aircraft handling signals performed by 20 different subjects... We next apply our HS-BNN to various standard UCI regression data sets. |
| Dataset Splits | Yes | The error rates reported are a result of averaging over five random 75/25 splits of the data. ... For the smaller data sets we train on a randomly subsampled 90% subset and evaluate on the remainder and repeat this process 20 times. For Protein we perform five replications and for Year we evaluate on a single split. ... For all but the year data set, we report results from 5 trials each trained on a random 90/10 split of the data. |
| Hardware Specification | Yes | Experiments were performed on a 2.5 GHz Intel Core i7, with 16GB of RAM. |
| Software Dependencies | No | The paper mentions using Adam (Kingma and Ba, 2015) for optimization, but does not specify any software libraries or packages with version numbers used for implementation. |
| Experiment Setup | Yes | In our experiments, unless otherwise mentioned we use a learning rate of 0.005. For regression problems we employ a Gaussian likelihood with an unknown precision γ, p(yn|f(W, xn), γ) = N(yn|f(W, xn), γ 1). We place a vague prior on the precision, γ Gamma(6, 6) and approximate the posterior over γ using another Gamma distribution. For classification problems we use a Categorical distribution parameterized by S(f(W, xn)), where S is the softmax transformation. ... For HS-BNN we used Adam with a learning rate of 0.005 and 500 epochs. ... All experiments used a batch size of 128, and bg = 10 5. ... We used learning rate 0.0002, and batch size 32, and trained the network for 2000 episodes. |