reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Random Forest Weighted Local Fréchet Regression with Random Objects

Authors: Rui Qiu, Zhou Yu, Ruoqing Zhu

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical studies show the superiority of our methods with several commonly encountered types of responses such as distribution functions, symmetric positive-deﬁnite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to New York taxi data and human mortality data. Keywords: metric space, Fréchet regression, random forest, nonparametric regression, inﬁnite order U-process
Researcher Affiliation	Academia	Rui Qiu EMAIL School of Statistics, KLATASDS-MOE East China Normal University Shanghai 200062, China Zhou Yu EMAIL School of Statistics, KLATASDS-MOE East China Normal University Shanghai 200062, China Ruoqing Zhu EMAIL Department of Statistics University of Illinois at Urbana-Champaign Champaign, IL 61820, USA
Pseudocode	Yes	Algorithm 1 : Variable importance calculation Inputs: A training set Dn = {(Xi, Yi)}n i=1, number of Fréchet trees B. Step 1. Construct a random forest consisting of B Fréchet trees {Tb(x; Db n, ξb)}B b=1 based on Dn, which generate the random forest kernel for the achievement of RFWLCFR. Step 2. for i = 1 to n do Identify the collection Ti of Fréchet trees whose growth (Xi, Yi) did not participate in: Ti = {Tb(x; Db n, ξb) : 1 b B, (Xi, Yi) / Db n}. Predict the response of Xi with RFWLCFR, denoted by ˆroob (Xi), based on the random forest kernel provided by Ti. end for Record the mean square error: R0 = 1 n Pn i=1 d2(ˆroob (Xi), Yi). Step 3. for j = 1 to p do Permute the values for the jth variable randomly in {Xi}n i=1 and repeat Step 2 with the permuted data and the same Ti, 1 i n, acquired in Step 2; Record the corresponding mean square error Rj. end for Step 4. Calculate the variable importance for the jth variable: VI(X(j)) = Rj R0, 1 j p.
Open Source Code	No	The paper does not explicitly state that source code for the methodology described in this paper is openly available, nor does it provide a direct link to a code repository. It mentions that "Julia code for the implementation of IFR can be found in the Git Hub platform" and "Our RFWLCFR and RFWLLFR are also implemented in R," but this refers to third-party tools or general implementation without providing specific access to their implementation.
Open Datasets	Yes	The New York City Taxi and Limousine Commission provides detailed records on yellow taxi rides, including pick-up and drop-off dates and times, pick-up and drop-off locations, trip distances, payment types, and other information. The data can be downloaded from https://www1.nyc. gov/site/tlc/about/tlc-trip-record-data.page. We also gather weather data for January and February 2019 from https://www.wunderground. com/history/daily/us/ny/new-york-city/KLGA/date The data are collected from United Nation Databases (http://data.un.org/) and UN World Population Prospects 2019 Databases (https://population.un.org/wpp/Download).
Dataset Splits	Yes	The data set consisting of 1416 samples is partitioned randomly into three parts for Fréchet regression: a training set of size 850, a validation set of size 283, and a testing set of size 283, following a ratio of 6 : 2 : 2. We then perform 9-fold testing to evaluate the performance of all Fréchet regression methods. Speciﬁcally, we divide the 162 countries into 9 parts evenly and conduct 9 training runs. For each run, one of the 9 parts is chosen as the testing set and the rest as the training set.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It focuses on the methods and datasets.
Software Dependencies	No	The paper mentions R-package frechet (Chen et al., 2020), R-package FrechForest (Capitaine, 2021), Julia code for IFR, and R-package matrix-manifold (Lin, 2020), but it does not specify version numbers for these software dependencies, only the year of their publication or creation.
Experiment Setup	Yes	There are three hyperparameters for each Fréchet tree: the size sn of each subsample, the depth of Fréchet trees and the number of features randomly selected at each internal node. The choice of sn is very tedious and time-consuming. Here we instead acquire all subsamples by sampling from the training data set Dn with replacement, which is commonly used in random forest codes. When the size n of Dn is large enough, each subsample is expected to have the fraction (1 1/e) 63.2% of the unique examples of Dn. We consider 3 log2 n for the range of tuning about the depth of Fréchet trees, where n is the number of training samples. For a fair comparison, each method chooses the hyperparameters by cross-validation.