Oblique Bayesian Additive Regression Trees

Authors: Paul-Hieu V. Nguyen, Ryan Yee, Sameer Deshpande

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using several synthetic and real-world benchmark datasets, we systematically compared our oblique BART implementation to axis-aligned BART and other tree ensemble methods, finding that oblique BART was competitive with and sometimes much better than those methods.
Researcher Affiliation Academia Paul-Hieu V. Nguyen EMAIL Department of Statistics University of Wisconsin Madison; Ryan Yee EMAIL Department of Statistics University of Wisconsin Madison; Sameer K. Deshpande EMAIL Department of Statistics University of Wisconsin Madison
Pseudocode No The paper describes the algorithms and methods textually and with diagrams (Figure 3 for grow/prune moves), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks formatted as code.
Open Source Code Yes An R package implementing oblique BART is available at https://github.com/paulhnguyen/obliqueBART.
Open Datasets Yes We obtained most of these datasets from UCI Machine Learning Repository (https://archive.ics.uci.edu); the Journal of Applied Econometrics data archive (http://qed.econ.queensu.ca/jae/); and from several R packages. See Table A1 for the dimensions of and links to these datasets.
Dataset Splits Yes We created 20 random 75%-25% training-testing splits of each dataset.
Hardware Specification No We performed all of our experiments on a shared high-throughput computing cluster. This statement is too general and does not provide specific hardware details like GPU/CPU models or memory.
Software Dependencies Yes We fit BART, RF, ERT, and XGBoost models using implementations available in the R packages BART (Sparapani et al., 2021), randomForest (Liaw & Wiener, 2002), ranger (Wright & Ziegler, 2017), and xgboost (Chen et al., 2024). We additionally note the use of ALGLIB 4.01.0 (Bochkanov, 2023).
Experiment Setup Yes For oblique BART and BART, we compute posterior means of f(x) (for regression) and P(y = 1|x) (for classification) based on 1000 samples obtained by simulating a single Markov chain for 2000 iterations and discarding the first 1000 as burn-in. For each competing method, we tuned hyperparameters using 5-fold cross-validation on each training dataset. Table A2 shows the grids of values considered for each method s hyperparameters.