reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Transformed Gaussian Processes

Authors: Xinran Zhu, Leo Huang, Eric Hans Lee, Cameron Alexander Ibrahim, David Bindel

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comparisons between the Bayesian and MLE approaches and provide experimental results for BTG and WGP coupled with 1-layer and 2-layer transformations. We find evidence that BTG is well-suited for low-data regimes, where hyperparameters are under-specified by the data. In these regimes, our empirical testing suggests that BTG provides superior point estimation and uncertainty quantification. [...] This section contains: Experimental details necessary to reproduce our experiments, including the datasets we used as well as the decisions we made regarding the BTG and GP models (such as the kernel). Experiments to validate the efficiency of our computational techniques. Thorough regression experiments, which demonstrate BTG s strong empirical performance when compared to appropriately selected baselines.
Researcher Affiliation	Collaboration	Xinran Zhu EMAIL Cornell University, Center for Applied Mathematics; Leo Huang EMAIL Cornell University, Department of Computer Science; Eric Hans Lee EMAIL Sig Opt: An Intel Company; Cameron Ibrahim EMAIL University of Delaware, Department of Computer and Information Sciences; David Bindel EMAIL Cornell University, Department of Computer Science
Pseudocode	Yes	Algorithms Algorithms 1 and 2 are used for efficiently computing TParameters( i) n i=1 and Det( i) n i=1 for fixed hyperparameters (θ, λ). The total time complexity is O(n3), because the dominant costs are precomputing a Cholesky factorization for a kernel matrix and repeating O(n2) operations across n submodels. Algorithm 1 T-Distributions of Sub-Models [...] Algorithm 2 Fast Determinant Computation
Open Source Code	Yes	We develop a modular Julia package for computing with transformed GPs (e.g., BTG and WGP) which exploits vectorized linear algebra operations and supports MLE and Bayesian inference1. 1The code used to run experiments in this paper can be found at https://github.com/xinranzhu/BTG.
Open Datasets	Yes	Two synthetic datasets: Int Sine and Six Hump Camel. The Int Sine dataset, also used by Lázaro-Gredilla (2012), is sampled from a rounded 1-dimensional sine function with Gaussian noise of a given variance. [...] The Six Hump Camel function is a 2-dimensional benchmark optimization function usually evaluated on [ 3, 3] [ 2, 2] (Molga & Smutnicki, 2005). [...] Three real datasets: Abalone, Wine Quality and Creep. Abalone is an 8-dimensional dataset, for which the prediction task is to determine the age of an abalone using eight physical measurements (Dua & Graff, 2017). The Wine Quality dataset has 12-dimensional explanatory variables and relates the quality of wine to input attributes (Cortez et al., 2009). The Creep dataset is 30-dimensional and relates the creep rupture stress (in MPa) for steel to chemical composition and other features (Cole et al., 2000).
Dataset Splits	Yes	The Int Sine dataset, also used by Lázaro-Gredilla (2012), is sampled from a rounded 1-dimensional sine function with Gaussian noise of a given variance. The training set is comprised of 51 uniformly spaced samples on [ π, π]. The testing set consists of 400 uniformly spaced points on [ π, π]. [...] The Six Hump Camel function [...] The training set is comprised of 50 quasi-uniform samples, i.e., a 2d Sobol sequence, from [ 1, 1] [ 2, 2]. The testing set consists of 400 uniformly distributed points on the same domain. [...] To simulate data-sparse training scenarios, we randomly select training samples of size 30, 200, and 100 from Abalone, Wine Quality, and Creep, respectively, and test on 500, 1000 and 1500 out-of-sample points.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only mentions general implementation details and compute times without hardware context.
Software Dependencies	No	We run all experiments using our Julia software package, which supports a variety of models (WGP, CWGP and BTG) and allows for flexible treatment of hyperparameters. For MLE optimization, we use the L-BFGS algorithm from the Julia Optim package (Mogensen & Riseth, 2018). While Julia and Optim are mentioned, specific version numbers for these software components are not provided, which is required for reproducibility.
Experiment Setup	Yes	Kernel: We used the squared exponential (RBF) kernel for all experiments: kθ(x, x ) = 1 2 x x 2 D 2 θ. Model: To model observation input noise for BTG, we add a regularization term to make the analytical marginalization of mean and precision tractable. We also assume the constant covariate m(x) = 1n in the BTG model, and normalize observations to the unit interval. We assume the constant mean field for both BTG and WGP. [...] The tolerance for the root-finding Brent s algorithm is set to be 10 3. [...] The hyperparameter space is 7-dimensional.