Oblique Bayesian Additive Regression Trees
Authors: Paul-Hieu V. Nguyen, Ryan Yee, Sameer Deshpande
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using several synthetic and real-world benchmark datasets, we systematically compared our oblique BART implementation to axis-aligned BART and other tree ensemble methods, finding that oblique BART was competitive with and sometimes much better than those methods. |
| Researcher Affiliation | Academia | Paul-Hieu V. Nguyen EMAIL Department of Statistics University of Wisconsin Madison; Ryan Yee EMAIL Department of Statistics University of Wisconsin Madison; Sameer K. Deshpande EMAIL Department of Statistics University of Wisconsin Madison |
| Pseudocode | No | The paper describes the algorithms and methods textually and with diagrams (Figure 3 for grow/prune moves), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks formatted as code. |
| Open Source Code | Yes | An R package implementing oblique BART is available at https://github.com/paulhnguyen/obliqueBART. |
| Open Datasets | Yes | We obtained most of these datasets from UCI Machine Learning Repository (https://archive.ics.uci.edu); the Journal of Applied Econometrics data archive (http://qed.econ.queensu.ca/jae/); and from several R packages. See Table A1 for the dimensions of and links to these datasets. |
| Dataset Splits | Yes | We created 20 random 75%-25% training-testing splits of each dataset. |
| Hardware Specification | No | We performed all of our experiments on a shared high-throughput computing cluster. This statement is too general and does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | Yes | We fit BART, RF, ERT, and XGBoost models using implementations available in the R packages BART (Sparapani et al., 2021), randomForest (Liaw & Wiener, 2002), ranger (Wright & Ziegler, 2017), and xgboost (Chen et al., 2024). We additionally note the use of ALGLIB 4.01.0 (Bochkanov, 2023). |
| Experiment Setup | Yes | For oblique BART and BART, we compute posterior means of f(x) (for regression) and P(y = 1|x) (for classification) based on 1000 samples obtained by simulating a single Markov chain for 2000 iterations and discarding the first 1000 as burn-in. For each competing method, we tuned hyperparameters using 5-fold cross-validation on each training dataset. Table A2 shows the grids of values considered for each method s hyperparameters. |