reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning

Authors: Taylor W. Killian, Sonali Parbhoo, Marzyeh Ghassemi

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of Distributional Dead-end Discovery (Dist De D) in a toy domain as well as when assessing the risk of severely ill patients in the intensive care unit reaching a point where death is unavoidable. We ﬁnd that Dist De D signiﬁcantly improves over prior discovery approaches, providing indications of the risk 10 hours earlier on average as well as increasing detection by 20%. Finally, we provide empirical evidence that our proposed framework enables an earlier determination of high-risk areas of the state space on both a simulated environment and a real application within healthcare of treating patients with sepsis.
Researcher Affiliation	Academia	Taylor W. Killian EMAIL University of Toronto, Vector Institute Massachusetts Institute of Technology; Sonali Parbhoo EMAIL Imperial College London; Marzyeh Ghassemi EMAIL Massachusetts Institute of Technology CIFAR AI Chair, Vector Institute
Pseudocode	No	The paper describes the Dist De D framework in prose and with a diagram in Figure 2, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code for data extraction and preprocessing as well as for deﬁning and training Dist De D models can be found at https://github.com/MLfor Health/Dist De D.
Open Datasets	Yes	We use the MIMIC-IV (Medical Information Mart for Intensive Care; v2.0) database, sourced from the Beth Israel Deaconess Medical Center in Boston, Massachusetts Johnson et al. (2020). This database contains deidentified treatment records of patients admitted to critical care units (CCU, CSRU, MICU, SICU, TSICU). The citation for MIMIC-IV is: Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2020). MIMIC-IV. PhysioNet. Available online at: https://physionet.org/content/mimiciv/1.0/(accessed August 23, 2021), 2020.
Dataset Splits	Yes	All models are trained with 75% of the data (4,014 surviving patients, 627 patients who died),validated with 5% (268 survivors, 42 nonsurvivors), and we report all results on the remaining held out 20% (1,070 survivors, 167 nonsurvivors).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies	No	The paper mentions software like Ax, BoTorch, and the Adam optimizer, but does not provide specific version numbers for these or any other key software components used in the experimental setup.
Experiment Setup	Yes	For De D, we model the QD and QR functions using the DDQN architecture (Hasselt et al., 2016) using two layers of 32-nodes with Re LU activations and a learning rate of 1e 3. For Dist De D we utilize IQN architectures (Dabney et al., 2018) for both ZD and ZR using two layers of 32 nodes, Re LU activations and the same learning rate of 1e 3. For each IQN model, we sample N, N = 8 particles from the local and target τ distributions while training and also weight the CQL penalty β = 0.1. When evaluating ZD and ZR, we select K = 1000 particles and set our conﬁdence level to α = 0.1. Additionally, Appendix A.2.1 states: For the encoding neural network, we used 2 layers with 80 hidden units in each with Re LU activations. The output dimension of this encoding network was 55... For optimization, the best learning rate was 5e 4 over 30 epochs. Appendix A.2.2 further states: For the IQN, the projection neural network accepted a 55 dimensional input (from the NCDE), consisted of 2 layers with 16 hidden units in each, using Re LU activations. The number of samples K drawn each optimization step was set to 64. The target network parameters were updated after every 5 optimization steps using an exponentially-weighted moving average with parameter τ set to 0.005. By construction, the discount rate γ is set to 1. For the weighting of the CQL penalty, β = 0.035. For optimization, we used Adam (Kingma & Ba, 2014) with the best performing learning rate found to be 2e 5 over 75 epochs of training.