Scalable Resampling in Massive Generalized Linear Models via Subsampled Residual Bootstrap

Authors: Indrila Ganguly, Srijan Sengupta, Sujit Ghosh

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the empirical performance of SRB via simulation studies and a real data analysis of the Forest Covertype data from the UCI Machine Learning Repository.
Researcher Affiliation Academia Indrila Ganguly EMAIL Biostatistics, Bioinformatics and Epidemiology Program Fred Hutchinson Cancer Center Seattle, WA 98109, USA Srijan Sengupta EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695-7103, USA Sujit Ghosh EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695-7103, USA
Pseudocode Yes Figure 1: Comparison of Residual Bootstrap and Subsampled Residual Bootstrap methods for GLMs
Open Source Code No The paper does not provide explicit statements or links to open-source code for the methodology described.
Open Datasets Yes We used the proposed SRB method to analyze the Forest Cover type data obtained from UCI Machine Learning Repository (Blackard, 1998).
Dataset Splits No The paper mentions using a subset of the data for real data analysis (n = 495,141 observations) and generating data for simulations, but it does not specify any training/test/validation dataset splits for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies No To perform logistic and Poisson regression, we employed the glm() function from the stats package in R, using the default starting values for the iteratively re-weighted least squares procedure.
Experiment Setup Yes For each GLM setting, we generated M = 48 data sets and carried out B = 25 iterations of SRB and RB for each data set. This choice of M and B ensures that the standard error of the average error rate is below 0.01 (see the Appendix for a proof). Each iteration of SRB or RB involves R = 100 resamples. For SRB, we take b = nγ with γ {0.5, 0.6, 0.7, 0.8, 0.9}.