reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Functional L-Optimality Subsampling for Functional Generalized Linear Models with Massive Data

Authors: Hua Liu, Jinhong You, Jiguo Cao

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The analysis results from extensive simulation studies and from the kidney transplant data show that the functional L-optimality subsampling (FLo S) method is much better than the uniform subsampling approach and can well approximate the results based on the full data while dramatically reducing the computation time and memory.
Researcher Affiliation	Academia	Hua Liu EMAIL School of Economics and Finance Xi an Jiaotong University Xi an, Shaanxi 710049, China Jinhong You EMAIL School of Statistics and Management Shanghai University of Finance and Economics Shanghai 200433, China Jiguo Cao EMAIL Department of Statistics and Actuarial Science Simon Fraser University Burnaby, BC V5A 1S6, Canada
Pseudocode	Yes	Algorithm 1: FLo S Algorithm for Functional Generalized Linear Model
Open Source Code	Yes	In addition, an R package Subsampling Fun Predictors has been developed for implementing the FLo S method. The R package and the R codes for the simulation studies can be downloaded at https://github.com/caojiguo/FLo S.
Open Datasets	Yes	The organ transplant data from the Organ Procurement Transplant Network/United Network for Organ Sharing (Optn/UNOS) as of September 2020 is a massive functional data, which is available at https://optn.transplant.hrsa.gov/ with the permission of OPTN/UNOS.
Dataset Splits	No	The paper describes how the kidney transplant data recipients were categorized (e.g., 23.3% for Y=0, 76.7% for Y=1), and simulation studies generated data, but it does not specify explicit training/test/validation dataset splits used for model evaluation in the conventional sense.
Hardware Specification	Yes	All computations are carried out on a computation platform with Intel Xeon 5 Cpu with 4 cores and 8G memory.
Software Dependencies	Yes	This paper uses the R programming language (enhanced R distribution Microsoft R 4.0.2) to implement each method.
Experiment Setup	Yes	In the implementation, we make the number of knots K = 1.25 n1/4 according to Assumption 5, where a means the least integer greater than or equal to a. ... Usually, in practice, we choose p = 3 and K is chosen to be relatively large so that local features of β(t) can be captured. Once K and p are ﬁxed, we can select the smoothing parameter λ by minimizing BIC.