Deep Kernel Posterior Learning under Infinite Variance Prior Weights
Authors: Jorge Loría, Anindya Bhadra
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.Numerical demonstrations in simulations and on benchmark UCI data sets that our method performs better in prediction than the competing methods, offers full predictive uncertainty estimates, and is considerably less time-consuming than the method of Lor ıa & Bhadra (2024) that is limited to shallow (one hidden layer) BNNs. |
| Researcher Affiliation | Academia | Jorge Lor ıa Department of Computer Science Aalto University Espoo, Finland EMAIL Anindya Bhadra Department of Statistics Purdue University West Lafayette, IN, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 A Metropolis Hastings sampler for the posterior predictive distribution of the deep α-kernel process. Algorithm 2 A Metropolis Hastings sampler for s(ℓ) + | s(ℓ+1) + , y Algorithm 3 A Metropolis Hastings sampler for s(L) + | y |
| Open Source Code | Yes | A full implementation, with examples, is freely available at: https://github.com/loria J/deep-alpha-kernel. |
| Open Datasets | Yes | Numerical demonstrations in simulations and on benchmark UCI data sets that our method performs better in prediction than the competing methods... We apply the five methods (Dα-KP, DIWP, NNGP, GP Bayes, and GP MLE) to the three well-known data sets from the UCI repository: Boston, Yacht, and Energy. |
| Dataset Splits | Yes | For training we consider 40 equally-spaced input points on [ 1, 1] and predict on 100 out-of-sample points... In two dimensions we use the function f(ξ1, ξ2) = 5 1{ξ1>0} +5 1{ξ1>0} and generate y(ξ1, ξ2) = f(ξ1, ξ2) + ε; ε N(0, 0.52), using a 7 7 uniform grid on [ 1, 1]2 for training, and a similar 9 9 grid for testing... We generate 20 splits, each with 300 training and 300 testing observations... To this end, we split each of the data sets in 20 different folds, training in 19 and testing in the remaining fold; and repeat the process for each of the folds. |
| Hardware Specification | No | All the times are in seconds and display CPU times. We avoid any type of parallelization throughout our experiments. The paper specifies that experiments were run on CPUs and provides CPU times but does not provide specific models or specifications of the CPUs used. |
| Software Dependencies | No | The paper does not explicitly list any specific software dependencies (libraries, frameworks) along with their version numbers for the methodology described. While a GitHub link is provided for implementation, the text itself lacks version details. |
| Experiment Setup | Yes | For the deep kernel methods of Aitchison et al. (2021) we train all the models using the same hyperparameters they use, with 8000 total steps with 10 2 as the step-size for the first half, and 10 3 for the second half. For the Dα-KP method we run 3000 MCMC simulations with α = 1, δ = 1. |