Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models

Authors: Reza Mohammadi, Matthew Pratola, Maurits Kaptein

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical support of the algorithm for Bayesian regression tree models and demonstrate its performance in a simulated example. Keywords: Bayesian Regression trees, Decision trees, Continuous-time MCMC, Bayesian structure learning, Birth-death process, Bayesian model averaging, Bayesian model selection. [...] 4. Empirical evaluation of our sampling approach We examine here the performance of the proposed CT-MCMC search algorithm based on a simulation scenario that is often used in the regression tree literature. [...] For each of the above search algorithms, we report the following measurements: MSE: This is the Mean Square Error. [...] Effective Sample Size: This is the number of effective independent draws that the algorithm generates.
Researcher Affiliation Academia Reza Mohammadi EMAIL Amsterdam Business School University of Amsterdam Amsterdam, The Netherlands Matthew Pratola EMAIL Department of Statistics The Ohio State University Ohio, USA Maurits Kaptein EMAIL Statistics and Research Methods University of Tilburg Tilburg, The Netherlands
Pseudocode Yes Algorithm 1 . CT-MCMC search algorithm Input: A tree (T, θT ), data D. [...] Algorithm 2 . CT-MCMC search algorithm exploiting conjugacy Input: A tree (T, θT ), data D.
Open Source Code Yes The current implementation of the methods proposed in this paper are available at https://bitbucket.org/mpratola/openbt.
Open Datasets No The synthetic data set consists of n = 300 data points with (x1, x2, x3) covariates where [...] The response y is calculated for n = 300 data points as: [...] To calculate the MSE, we generate another synthetic data set consists of n = 300 data points as a test set.
Dataset Splits Yes The synthetic data set consists of n = 300 data points with (x1, x2, x3) covariates where [...] The response y is calculated for n = 300 data points as: [...] To calculate the MSE, we generate another synthetic data set consists of n = 300 data points as a test set.
Hardware Specification Yes All the computations were carried out on a Mac Book Pro with 2.9 GHz processor and Quad-Core Intel Core i7.
Software Dependencies No We perform all the computations in R and the computationally intensive tasks are implemented in parallel in C and interfaced in R.
Experiment Setup Yes To evaluate the performance of the CT-MCMC search algorithm with compare with the RJ-MCMC, we run all the above search algorithms in the same conditions with 20,000 iterations and 1,000 as a burn-in. [...] Table 1 presents the results for σ2 = 1 which is a relatively challenging, high-noise, scenario.