Inference for the Case Probability in High-dimensional Logistic Regression

Authors: Zijian Guo, Prabrisha Rakshit, Daniel S. Herman, Jinbo Chen

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.
Researcher Affiliation Academia Zijian Guo EMAIL Prabrisha Rakshit EMAIL Department of Statistics Rutgers University Piscataway, New Jersey, USA Daniel S. Herman EMAIL Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia, Pennsylvania, USA Jinbo Chen EMAIL Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia, Pennsylvania, USA
Pseudocode Yes We provide details on how to implement the Li VE estimator defined in (7). The initial estimator bβ defined in (3) is computed using the R-package cv.glmnet (Friedman et al., 2010) with the tuning parameter λ chosen by cross-validation. To compute the projection direction bu Rp, we implement the following constrained optimization, bu = arg min u Rp u bΣu subject to bΣu x x 2λn, |x bΣu x 2 2| x 2 2λn. (27) This construction does not include the constraint (11), which is mainly imposed to facilitating the theoretical proof. We have conducted an additional check in simulations and observed that our constructed bu in (27) satisfies Xbu C log n x 2; see Section C.2 in the supplementary material for details. We solve the dual problem of (27), bv = arg min v Rp+1 1 4v H bΣHv + b Hv + λn v 1 with H = [b, Ip p] , b = 1 x 2 x (28) and then solve the primal problem (27) as bu = (bv 1 + bv1b) /2. We refer to Proposition 2 in Cai et al. (2019) for the the detailed derivation of the dual problem (28). In this dual problem, when bΣ is singular and the tuning parameter λn > 0 gets sufficiently close to 0, the dual problem cannot be solved as the minimum value converges to negative infinity. Hence, we choose the smallest λn > 0 such that the dual problem has a finite minimum value. The tuning parameter λn selected in this manner is at the scale of p log p/n. We investigate the ratio λn/ p log p/n in Section C.1 in the supplement.
Open Source Code Yes Our proposed Li VE estimator has been implemented in the R package SIHR, which is available from CRAN.
Open Datasets No We demonstrate the proposed method using Penn Medicine EHR data to identify patients with hypertension and two subsets thereof that should be screened for PA, per specialty guidelines. The data were extracted from the Penn Medicine clinical data repository, including demographics, laboratory results, medication prescriptions, vital signs, and encounter meta information. The paper does not provide concrete access information (link, DOI, repository, or formal citation for public access) for the Penn Medicine EHR data.
Dataset Splits Yes In our analysis, we randomly sampled 30 patients as the test sample, then their predictor vectors were treated as x . A prediction model for each outcome variable was developed using the remaining 318 patients and then applied to the test sample to obtain bias-corrected estimates of the case probabilities using our method.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions several R packages (cv.glmnet, hdi, SIHR) and algorithms (WLDP) but does not provide specific version numbers for these software components or for R itself.
Experiment Setup Yes The initial estimator bβ defined in (3) is computed using the R-package cv.glmnet (Friedman et al., 2010) with the tuning parameter λ chosen by cross-validation. To compute the projection direction bu Rp, we implement the following constrained optimization, bu = arg min u Rp u bΣu subject to bΣu x x 2λn, |x bΣu x 2 2| x 2 2λn. (27) ... Hence, we choose the smallest λn > 0 such that the dual problem has a finite minimum value. The tuning parameter λn selected in this manner is at the scale of p log p/n. We investigate the ratio λn/ p log p/n in Section C.1 in the supplement. We set p = 501, Σ = {0.51+|j l|}1 j l (p 1) and vary n {200, 400, 600}.