Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models
Authors: Reza Mohammadi, Matthew Pratola, Maurits Kaptein
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical support of the algorithm for Bayesian regression tree models and demonstrate its performance in a simulated example. Keywords: Bayesian Regression trees, Decision trees, Continuous-time MCMC, Bayesian structure learning, Birth-death process, Bayesian model averaging, Bayesian model selection. [...] 4. Empirical evaluation of our sampling approach We examine here the performance of the proposed CT-MCMC search algorithm based on a simulation scenario that is often used in the regression tree literature. [...] For each of the above search algorithms, we report the following measurements: MSE: This is the Mean Square Error. [...] Effective Sample Size: This is the number of effective independent draws that the algorithm generates. |
| Researcher Affiliation | Academia | Reza Mohammadi EMAIL Amsterdam Business School University of Amsterdam Amsterdam, The Netherlands Matthew Pratola EMAIL Department of Statistics The Ohio State University Ohio, USA Maurits Kaptein EMAIL Statistics and Research Methods University of Tilburg Tilburg, The Netherlands |
| Pseudocode | Yes | Algorithm 1 . CT-MCMC search algorithm Input: A tree (T, θT ), data D. [...] Algorithm 2 . CT-MCMC search algorithm exploiting conjugacy Input: A tree (T, θT ), data D. |
| Open Source Code | Yes | The current implementation of the methods proposed in this paper are available at https://bitbucket.org/mpratola/openbt. |
| Open Datasets | No | The synthetic data set consists of n = 300 data points with (x1, x2, x3) covariates where [...] The response y is calculated for n = 300 data points as: [...] To calculate the MSE, we generate another synthetic data set consists of n = 300 data points as a test set. |
| Dataset Splits | Yes | The synthetic data set consists of n = 300 data points with (x1, x2, x3) covariates where [...] The response y is calculated for n = 300 data points as: [...] To calculate the MSE, we generate another synthetic data set consists of n = 300 data points as a test set. |
| Hardware Specification | Yes | All the computations were carried out on a Mac Book Pro with 2.9 GHz processor and Quad-Core Intel Core i7. |
| Software Dependencies | No | We perform all the computations in R and the computationally intensive tasks are implemented in parallel in C and interfaced in R. |
| Experiment Setup | Yes | To evaluate the performance of the CT-MCMC search algorithm with compare with the RJ-MCMC, we run all the above search algorithms in the same conditions with 20,000 iterations and 1,000 as a burn-in. [...] Table 1 presents the results for σ2 = 1 which is a relatively challenging, high-noise, scenario. |