Deep Kernel Posterior Learning under Infinite Variance Prior Weights

Authors: Jorge Loría, Anindya Bhadra

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.Numerical demonstrations in simulations and on benchmark UCI data sets that our method performs better in prediction than the competing methods, offers full predictive uncertainty estimates, and is considerably less time-consuming than the method of Lor ıa & Bhadra (2024) that is limited to shallow (one hidden layer) BNNs.
Researcher Affiliation Academia Jorge Lor ıa Department of Computer Science Aalto University Espoo, Finland EMAIL Anindya Bhadra Department of Statistics Purdue University West Lafayette, IN, USA EMAIL
Pseudocode Yes Algorithm 1 A Metropolis Hastings sampler for the posterior predictive distribution of the deep α-kernel process. Algorithm 2 A Metropolis Hastings sampler for s(ℓ) + | s(ℓ+1) + , y Algorithm 3 A Metropolis Hastings sampler for s(L) + | y
Open Source Code Yes A full implementation, with examples, is freely available at: https://github.com/loria J/deep-alpha-kernel.
Open Datasets Yes Numerical demonstrations in simulations and on benchmark UCI data sets that our method performs better in prediction than the competing methods... We apply the five methods (Dα-KP, DIWP, NNGP, GP Bayes, and GP MLE) to the three well-known data sets from the UCI repository: Boston, Yacht, and Energy.
Dataset Splits Yes For training we consider 40 equally-spaced input points on [ 1, 1] and predict on 100 out-of-sample points... In two dimensions we use the function f(ξ1, ξ2) = 5 1{ξ1>0} +5 1{ξ1>0} and generate y(ξ1, ξ2) = f(ξ1, ξ2) + ε; ε N(0, 0.52), using a 7 7 uniform grid on [ 1, 1]2 for training, and a similar 9 9 grid for testing... We generate 20 splits, each with 300 training and 300 testing observations... To this end, we split each of the data sets in 20 different folds, training in 19 and testing in the remaining fold; and repeat the process for each of the folds.
Hardware Specification No All the times are in seconds and display CPU times. We avoid any type of parallelization throughout our experiments. The paper specifies that experiments were run on CPUs and provides CPU times but does not provide specific models or specifications of the CPUs used.
Software Dependencies No The paper does not explicitly list any specific software dependencies (libraries, frameworks) along with their version numbers for the methodology described. While a GitHub link is provided for implementation, the text itself lacks version details.
Experiment Setup Yes For the deep kernel methods of Aitchison et al. (2021) we train all the models using the same hyperparameters they use, with 8000 total steps with 10 2 as the step-size for the first half, and 10 3 for the second half. For the Dα-KP method we run 3000 MCMC simulations with α = 1, δ = 1.