reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Riemann-Lebesgue Forest for Regression

Authors: Tian Qin, Wei-Min Huang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted in section 4 to illustrate the competitive performance of RLF under small local random forests against other ensemble methods. The simulation results of tuned RLF in models with a small signal-to-noise ratio and mixture distribution are also provided.
Researcher Affiliation	Academia	Tian Qin EMAIL Department of Mathematics Lehigh University Wei-Min Huang EMAIL Department of Mathematics Lehigh University
Pseudocode	Yes	Algorithm 1 Riemann-Lebesgue Tree (Fitting) Algorithm 2 Riemann-Lebesgue Forest prediction at x
Open Source Code	Yes	The R codes for the implementation of RLF, selected real datasets and the simulation results are available in supplementary materials.
Open Datasets	Yes	We used 10-folds stratified cross-validation 2 to compare the performance of RLF and RF on 30 benchmark real datasets from Fischer et al. (2023). In our paper, all datasets used in experiments are available on openml (Fischer et al., 2023)
Dataset Splits	Yes	We used 10-folds stratified cross-validation 2 to compare the performance of RLF and RF on 30 benchmark real datasets from Fischer et al. (2023). We performed 5-fold cross validations to ensure 20% of observations are used as testing set. For the rest of 80% points, we further randomly pick 25% of them as validation set, which is used to select best models among parameter space. As a result, the ratio of training,validation and testing is 6:2:2.
Hardware Specification	Yes	All simulations and experiments are performed on a laptop with 12th Gen Intel(R) Core(TM) i7-12700H (2.30 GHz) ,16.0 GB RAM and NVIDIA 4070 GPU.
Software Dependencies	No	The R codes for the implementation of RLF, selected real datasets and the simulation results are available in supplementary materials. (The paper mentions 'R codes' but does not specify version numbers for R or any specific R packages/libraries, which is required for reproducibility.)
Experiment Setup	Yes	The default setting for RLF is M = 100, Mlocal = 10, α = 0.632, Mnode = 5. Same setting except Mlocal applies to RF. To obtain, for example the test MSE as a function of number of global trees, we set M = 1, 2, ..., 500 while other parameters remain the same with default setting. Similar strategy was applied to the other three hyperparameters we are interested. For RF, we set subagging ratio α {0.5, 0.63, 0.8}, minimal node size Mnode {5, 10, 15} and number of trees M {50, 100, 150, 200}. For RLF, we keep M = 100 and α = 0.63 all the time for efficiency. We set p {0.2, 0.4, 0.6, 0.8} and Mlocal {10, 20, 50} which are two new parameters introduced in RLF.