reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Patient Risk Stratification with Time-Varying Parameters: A Multitask Learning Approach

Authors: Jenna Wiens, John Guttag, Eric Horvitz

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Applied to a held out set of approximately 25,000 patient admissions, we achieve an area under the receiver operating characteristic curve of 0.81 (95% CI 0.78-0.84). The model has been integrated into the health record system at a large hospital in the US, and can be used to produce daily risk estimates for each inpatient.
Researcher Affiliation	Collaboration	Jenna Wiens EMAIL Computer Science & Engineering University of Michigan Ann Arbor, MI John Guttag EMAIL Department of EECS Massachusetts Institute of Technology Cambridge, MA Eric Horvitz EMAIL Microsoft Research Redmond, WA
Pseudocode	No	The paper describes the problem setup and the learning algorithms using mathematical formulations (Equation 1 and 2), but does not contain a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it provide a link to a code repository. It does mention using LIBLINEAR (Fan et al., 2008), but this is a third-party tool.
Open Datasets	No	We considered all adult inpatient admissions to a large private hospital in the US over a two year period. We leverage the contents of EHRs from over 50,000 patient admissions from a single hospital.
Dataset Splits	Yes	We split the data into a training set and a holdout set based on time, training on data from the ﬁrst year, and validating our model on data from the second year. The training data consisted of patient admissions from 2011-04-12 to 2012-04-11, totaling 190,675 visit days pertaining to 24,607 unique visits. Within the training data, 258 admissions had a positive test for C. diﬃcile resulting in 2,608 training days with a positive label. [...] The validation, which consisted of patient admissions from 2012-04-12 to 2013-04-12 and was composed of 24,399 admissions of which 242 had a positive test result for C. diﬃcile. [...] To select the hyperparameter C in (1), we performed repeated ﬁve-fold cross validation on the training data, choosing a setting that maximized the AUROC.
Hardware Specification	No	The paper does not contain any specific details about the hardware used for running its experiments, such as GPU/CPU models or memory.
Software Dependencies	No	The model parameters, i.e., θ, were solved for using LIBLINEAR (Fan et al., 2008). (This cites a tool, but does not provide a specific version number for this or any other software dependency for replication.)
Experiment Setup	Yes	We selected the number of tasks T, and the corresponding temporal intervals τj for j = 1, ..., T based on the number of training examples available for each interval. For our data, this resulted in six distinct tasks, corresponding to six distinct time periods: D1, D2, D3, D4, D5, D6. [...] To select the hyperparameter C in (1), we performed repeated ﬁve-fold cross validation on the training data, choosing a setting that maximized the AUROC.