Riemann-Lebesgue Forest for Regression

Authors: Tian Qin, Wei-Min Huang

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted in section 4 to illustrate the competitive performance of RLF under small local random forests against other ensemble methods. The simulation results of tuned RLF in models with a small signal-to-noise ratio and mixture distribution are also provided.
Researcher Affiliation Academia Tian Qin EMAIL Department of Mathematics Lehigh University Wei-Min Huang EMAIL Department of Mathematics Lehigh University
Pseudocode Yes Algorithm 1 Riemann-Lebesgue Tree (Fitting) Algorithm 2 Riemann-Lebesgue Forest prediction at x
Open Source Code Yes The R codes for the implementation of RLF, selected real datasets and the simulation results are available in supplementary materials.
Open Datasets Yes We used 10-folds stratified cross-validation 2 to compare the performance of RLF and RF on 30 benchmark real datasets from Fischer et al. (2023). In our paper, all datasets used in experiments are available on openml (Fischer et al., 2023)
Dataset Splits Yes We used 10-folds stratified cross-validation 2 to compare the performance of RLF and RF on 30 benchmark real datasets from Fischer et al. (2023). We performed 5-fold cross validations to ensure 20% of observations are used as testing set. For the rest of 80% points, we further randomly pick 25% of them as validation set, which is used to select best models among parameter space. As a result, the ratio of training,validation and testing is 6:2:2.
Hardware Specification Yes All simulations and experiments are performed on a laptop with 12th Gen Intel(R) Core(TM) i7-12700H (2.30 GHz) ,16.0 GB RAM and NVIDIA 4070 GPU.
Software Dependencies No The R codes for the implementation of RLF, selected real datasets and the simulation results are available in supplementary materials. (The paper mentions 'R codes' but does not specify version numbers for R or any specific R packages/libraries, which is required for reproducibility.)
Experiment Setup Yes The default setting for RLF is M = 100, Mlocal = 10, α = 0.632, Mnode = 5. Same setting except Mlocal applies to RF. To obtain, for example the test MSE as a function of number of global trees, we set M = 1, 2, ..., 500 while other parameters remain the same with default setting. Similar strategy was applied to the other three hyperparameters we are interested. For RF, we set subagging ratio α {0.5, 0.63, 0.8}, minimal node size Mnode {5, 10, 15} and number of trees M {50, 100, 150, 200}. For RLF, we keep M = 100 and α = 0.63 all the time for efficiency. We set p {0.2, 0.4, 0.6, 0.8} and Mlocal {10, 20, 50} which are two new parameters introduced in RLF.