Functional L-Optimality Subsampling for Functional Generalized Linear Models with Massive Data

Authors: Hua Liu, Jinhong You, Jiguo Cao

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The analysis results from extensive simulation studies and from the kidney transplant data show that the functional L-optimality subsampling (FLo S) method is much better than the uniform subsampling approach and can well approximate the results based on the full data while dramatically reducing the computation time and memory.
Researcher Affiliation Academia Hua Liu EMAIL School of Economics and Finance Xi an Jiaotong University Xi an, Shaanxi 710049, China Jinhong You EMAIL School of Statistics and Management Shanghai University of Finance and Economics Shanghai 200433, China Jiguo Cao EMAIL Department of Statistics and Actuarial Science Simon Fraser University Burnaby, BC V5A 1S6, Canada
Pseudocode Yes Algorithm 1: FLo S Algorithm for Functional Generalized Linear Model
Open Source Code Yes In addition, an R package Subsampling Fun Predictors has been developed for implementing the FLo S method. The R package and the R codes for the simulation studies can be downloaded at https://github.com/caojiguo/FLo S.
Open Datasets Yes The organ transplant data from the Organ Procurement Transplant Network/United Network for Organ Sharing (Optn/UNOS) as of September 2020 is a massive functional data, which is available at https://optn.transplant.hrsa.gov/ with the permission of OPTN/UNOS.
Dataset Splits No The paper describes how the kidney transplant data recipients were categorized (e.g., 23.3% for Y=0, 76.7% for Y=1), and simulation studies generated data, but it does not specify explicit training/test/validation dataset splits used for model evaluation in the conventional sense.
Hardware Specification Yes All computations are carried out on a computation platform with Intel Xeon 5 Cpu with 4 cores and 8G memory.
Software Dependencies Yes This paper uses the R programming language (enhanced R distribution Microsoft R 4.0.2) to implement each method.
Experiment Setup Yes In the implementation, we make the number of knots K = 1.25 n1/4 according to Assumption 5, where a means the least integer greater than or equal to a. ... Usually, in practice, we choose p = 3 and K is chosen to be relatively large so that local features of β(t) can be captured. Once K and p are fixed, we can select the smoothing parameter λ by minimizing BIC.