Riemann-Lebesgue Forest for Regression
Authors: Tian Qin, Wei-Min Huang
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted in section 4 to illustrate the competitive performance of RLF under small local random forests against other ensemble methods. The simulation results of tuned RLF in models with a small signal-to-noise ratio and mixture distribution are also provided. |
| Researcher Affiliation | Academia | Tian Qin EMAIL Department of Mathematics Lehigh University Wei-Min Huang EMAIL Department of Mathematics Lehigh University |
| Pseudocode | Yes | Algorithm 1 Riemann-Lebesgue Tree (Fitting) Algorithm 2 Riemann-Lebesgue Forest prediction at x |
| Open Source Code | Yes | The R codes for the implementation of RLF, selected real datasets and the simulation results are available in supplementary materials. |
| Open Datasets | Yes | We used 10-folds stratified cross-validation 2 to compare the performance of RLF and RF on 30 benchmark real datasets from Fischer et al. (2023). In our paper, all datasets used in experiments are available on openml (Fischer et al., 2023) |
| Dataset Splits | Yes | We used 10-folds stratified cross-validation 2 to compare the performance of RLF and RF on 30 benchmark real datasets from Fischer et al. (2023). We performed 5-fold cross validations to ensure 20% of observations are used as testing set. For the rest of 80% points, we further randomly pick 25% of them as validation set, which is used to select best models among parameter space. As a result, the ratio of training,validation and testing is 6:2:2. |
| Hardware Specification | Yes | All simulations and experiments are performed on a laptop with 12th Gen Intel(R) Core(TM) i7-12700H (2.30 GHz) ,16.0 GB RAM and NVIDIA 4070 GPU. |
| Software Dependencies | No | The R codes for the implementation of RLF, selected real datasets and the simulation results are available in supplementary materials. (The paper mentions 'R codes' but does not specify version numbers for R or any specific R packages/libraries, which is required for reproducibility.) |
| Experiment Setup | Yes | The default setting for RLF is M = 100, Mlocal = 10, α = 0.632, Mnode = 5. Same setting except Mlocal applies to RF. To obtain, for example the test MSE as a function of number of global trees, we set M = 1, 2, ..., 500 while other parameters remain the same with default setting. Similar strategy was applied to the other three hyperparameters we are interested. For RF, we set subagging ratio α {0.5, 0.63, 0.8}, minimal node size Mnode {5, 10, 15} and number of trees M {50, 100, 150, 200}. For RLF, we keep M = 100 and α = 0.63 all the time for efficiency. We set p {0.2, 0.4, 0.6, 0.8} and Mlocal {10, 20, 50} which are two new parameters introduced in RLF. |