reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Weighted Random Forests

Authors: Xinyu Chen, Dalei Yu, Xinyu Zhang

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical studies conducted on real-world data sets and semi-synthetic data sets indicate that these algorithms outperform the equal-weight forest and two other weighted RFs proposed in the existing literature in most cases.
Researcher Affiliation	Academia	Xinyu Chen EMAIL International Institute of Finance School of Management University of Science and Technology of China Hefei, 230026, Anhui, China Dalei Yu EMAIL School of Mathematics and Statistics Xi an Jiaotong University Xi an, 710049, Shaanxi, China Xinyu Zhang EMAIL Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China International Institute of Finance School of Management University of Science and Technology of China Hefei, 230026, Anhui, China
Pseudocode	Yes	Algorithm 1: 1step-WRFopt Algorithm A.1: CART Algorithm B.1: 2steps-WRFopt
Open Source Code	Yes	The data and codes are available publicly on https://github.com/Xinyu Chen-hey/Optimal Weighted-Random-Forests.
Open Datasets	Yes	To assess the prediction performance of diﬀerent weighted RFs in practical situations, we used 11 data sets from the UCI data repository for machine learning (Dua and Graﬀ, 2017). Because most of these data sets are low-dimensional, one additional high-dimensional data set from openml.org (Vanschoren et al., 2013) was also included.
Dataset Splits	Yes	We randomly partitioned each data set into training data, testing data and validation data, in the ratio of 0.5 : 0.3 : 0.2.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running its experiments. It only refers to computational resources generally, without providing details like CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	Many contemporary software packages, such as quadprog in R or MATLAB, can eﬀectively handle quadratic programming problems. ...R package random Forest. ...generated additional attributes by sklearn.preprocessing.PolynomialFeatures (Pedregosa et al., 2011).
Experiment Setup	Yes	In this section, the number of trees Mn was set to 100. Before each split, the dimension of random feature sub-space q was set to p/3 , which is the default value in the regression mode of the R package random Forest. We set the minimum leaf size nodesize to n in CART trees and 5 in SUT trees, in order to control the depth of trees.