reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping

Authors: Yichi Zhang, Molei Liu, Matey Neykov, Tianxi Cai

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital. [...] We conducted extensive simulation studies to examine the ﬁnite-sample performance of the PASS estimator and to compare it with existing approaches. [...] We examine the performance of PASS along with other approaches in three real world EHR phenotyping studies with the goal of developing classiﬁcation models for the diseases of interest.
Researcher Affiliation	Academia	Yichi Zhang EMAIL Department of Computer Science and Statistics University of Rhode Island Molei Liu EMAIL Department of Biostatistics Harvard T.H. Chan School of Public Health Matey Neykov EMAIL Department of Statistics and Data Science Carnegie Mellon University Tianxi Cai EMAIL Department of Biostatistics Harvard T.H. Chan School of Public Health
Pseudocode	No	The paper describes the methodology using mathematical equations and textual explanations but does not include any distinct pseudocode blocks or algorithms.
Open Source Code	Yes	R codes for implementing PASS and the benchmark methods, and replicating the simulation results can be found at https://github.com/moleibobliu/PASS.
Open Datasets	Yes	This de-identiﬁed dataset has been analyzed in previous studies (Zhang et al., 2019, e.g.) and is publicly available online: https://celehs.github.io/PheCAP/articles/example2.html.
Dataset Splits	Yes	For each choice of eϑ W, we consider the area under the receiver operating characteristic curve (AUC) for classifying Y , the excess risk (ER) as deﬁned in Section 3, and the mean squared error of the predicted probabilities (MSE-P) which is the mean squared diﬀerences between the predicted probability and the true probability. We summarize results based on 1000 simulated datasets for each conﬁguration. [...] First, we randomly split the labelled samples into four folds of equal sizes. Then we pick each fold as the validation set, sample n training labels from the other three folds for 20 times, train and validate the algorithms, and ﬁnally average the evaluation metrics and their standard errors over the validation results on the four folds. We replicate this procedure 10 times and report the average performance.
Hardware Specification	No	The paper does not specify any particular hardware (GPU, CPU models, etc.) used for running the experiments.
Software Dependencies	No	In this paper, we use the R package glmnet (Friedman et al., 2010) to compute bζ, bγ, bρ, and bδ, and construct the ﬁnal estimator for ϑ0 as bϑ = (bζ, bγ, bβ ) with bβ = bδ + bρbα. The version number for the R package 'glmnet' or R itself is not specified.
Experiment Setup	Yes	Throughout, we let N = 10000 and let ν = 1 in the ALASSO weights. We use Bayesian information criterion (BIC) to select µinit and µ in the estimation of α due to large N, and use 10-fold cross-validation to select λ1, λ2 for the estimation of β, so that the phenotype model is tuned towards prediction performance. [...] For the size of training labels, we consider n = 50, 70, 90. [...] For the size of training labels, we consider n = 50, 125, 200. [...] For the size of training labels, we consider 50, 85, 120.