Scalable Resampling in Massive Generalized Linear Models via Subsampled Residual Bootstrap
Authors: Indrila Ganguly, Srijan Sengupta, Sujit Ghosh
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the empirical performance of SRB via simulation studies and a real data analysis of the Forest Covertype data from the UCI Machine Learning Repository. |
| Researcher Affiliation | Academia | Indrila Ganguly EMAIL Biostatistics, Bioinformatics and Epidemiology Program Fred Hutchinson Cancer Center Seattle, WA 98109, USA Srijan Sengupta EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695-7103, USA Sujit Ghosh EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695-7103, USA |
| Pseudocode | Yes | Figure 1: Comparison of Residual Bootstrap and Subsampled Residual Bootstrap methods for GLMs |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the methodology described. |
| Open Datasets | Yes | We used the proposed SRB method to analyze the Forest Cover type data obtained from UCI Machine Learning Repository (Blackard, 1998). |
| Dataset Splits | No | The paper mentions using a subset of the data for real data analysis (n = 495,141 observations) and generating data for simulations, but it does not specify any training/test/validation dataset splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | To perform logistic and Poisson regression, we employed the glm() function from the stats package in R, using the default starting values for the iteratively re-weighted least squares procedure. |
| Experiment Setup | Yes | For each GLM setting, we generated M = 48 data sets and carried out B = 25 iterations of SRB and RB for each data set. This choice of M and B ensures that the standard error of the average error rate is below 0.01 (see the Appendix for a proof). Each iteration of SRB or RB involves R = 100 resamples. For SRB, we take b = nγ with γ {0.5, 0.6, 0.7, 0.8, 0.9}. |