Optimal Weighted Random Forests

Authors: Xinyu Chen, Dalei Yu, Xinyu Zhang

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies conducted on real-world data sets and semi-synthetic data sets indicate that these algorithms outperform the equal-weight forest and two other weighted RFs proposed in the existing literature in most cases.
Researcher Affiliation Academia Xinyu Chen EMAIL International Institute of Finance School of Management University of Science and Technology of China Hefei, 230026, Anhui, China Dalei Yu EMAIL School of Mathematics and Statistics Xi an Jiaotong University Xi an, 710049, Shaanxi, China Xinyu Zhang EMAIL Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China International Institute of Finance School of Management University of Science and Technology of China Hefei, 230026, Anhui, China
Pseudocode Yes Algorithm 1: 1step-WRFopt Algorithm A.1: CART Algorithm B.1: 2steps-WRFopt
Open Source Code Yes The data and codes are available publicly on https://github.com/Xinyu Chen-hey/Optimal Weighted-Random-Forests.
Open Datasets Yes To assess the prediction performance of different weighted RFs in practical situations, we used 11 data sets from the UCI data repository for machine learning (Dua and Graff, 2017). Because most of these data sets are low-dimensional, one additional high-dimensional data set from openml.org (Vanschoren et al., 2013) was also included.
Dataset Splits Yes We randomly partitioned each data set into training data, testing data and validation data, in the ratio of 0.5 : 0.3 : 0.2.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running its experiments. It only refers to computational resources generally, without providing details like CPU/GPU models, memory, or cloud instance types.
Software Dependencies No Many contemporary software packages, such as quadprog in R or MATLAB, can effectively handle quadratic programming problems. ...R package random Forest. ...generated additional attributes by sklearn.preprocessing.PolynomialFeatures (Pedregosa et al., 2011).
Experiment Setup Yes In this section, the number of trees Mn was set to 100. Before each split, the dimension of random feature sub-space q was set to p/3 , which is the default value in the regression mode of the R package random Forest. We set the minimum leaf size nodesize to n in CART trees and 5 in SUT trees, in order to control the depth of trees.