reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Medical Dead-ends and Learning to Identify High-Risk States and Treatments

Authors: Mehdi Fatemi, Taylor W. Killian, Jayakumar Subramanian, Marzyeh Ghassemi

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then train three independent deep neural models for automated state construction, dead-end discovery and conﬁrmation. Our empirical results discover that dead-ends exist in real clinical data among septic patients, and further reveal gaps between secure treatments and those that were administered. We validate De D in a carefully constructed toy domain, and then evaluate real health records of septic patients in an intensive care unit (ICU) setting [22].
Researcher Affiliation	Collaboration	Mehdi Fatemi Microsoft Research EMAIL Taylor W. Killian University of Toronto, Vector Institute EMAIL Jayakumar Subramanian Media and Data Science Research, Adobe India EMAIL Marzyeh Ghassemi Massachusetts Institute of Technology EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and pretrained models to replicate the analysis (including ﬁgures) presented in this paper is located at https://github.com/microsoft/med-deadend.
Open Datasets	Yes	We use De D to identify medical dead-ends in a cohort of septic patients drawn from the MIMIC (Medical Information Mart for Intensive Care) III dataset (v1.4) [22, 48]. The MIMIC-III databases (DOI: 10.1038/sdata.2016.35) that support the ﬁndings of this study are publicly available through Physionet website: https://mimic.physionet.org, which facilitates reproducibility of the presented results.
Dataset Splits	Yes	All models are trained with 75% of the patient cohort (14,179 survivors, 1,509 nonsurvivors), validated with 5% (890 survivors, 90 nonsurvivors), and we report all results on the remaining held out 20% (2,660 survivors, 282 nonsurvivors).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	Thus, a stratiﬁed minibatch of size 64 is constructed of 62 samples from the main data, augmented with 2 samples from this additional buffer, all selected uniformly. This same minibatch structure is used for training each of the three networks. For the training details see Appendix A4 and A5.