reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning-to-defer for sequential medical decision-making under uncertainty

Authors: Shalmali Joshi, Sonali Parbhoo, Finale Doshi-Velez

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that adaptive deferral via SLTD provides an improved trade-oﬀbetween long-term outcomes and deferral frequency on synthetic, semi-synthetic, and real-world data with non-stationary dynamics. Finally, we interpret the deferral decision by decomposing the propagated (long-term) uncertainty around the outcome, to justify the deferral decision.
Researcher Affiliation	Academia	Shalmali Joshi EMAIL Columbia University Sonali Parbhoo EMAIL Imperial College London Finale Doshi-Velez ﬁnale@seas.harvard.edu Harvard University
Pseudocode	Yes	Algorithm 1 Sequential Learning to Defer Input: D , expert policy π0, target policy πtar. Estimate Posterior Distributions {Mt pt( \|D )}T t=0 (posteriors over rewards not shown here) Initialization: Deferral function gπtar(s, t) = 0 for all s S and t {1, 2, , T}. for n BOOTSTRAPS(D ) do Sample Mk {Mk,t pt( \|D )} t {1, 2, , K} for t {T, T 1, , 1} do Compute V M πtar(t),mix(t+), V M π0(t),mix(t+) c M gπtar(s, t) 1 K P Mk {pt ( \|D)}T t =t[1(V Mk πtar(t),mix(t+) < V Mk π0(t),mix(t+) c)] end for end for end for return gπtar(s, t) = 1( gπtar(s, t) > τ) s, t S {1, 2, , T}
Open Source Code	No	The paper mentions using "an open-source implementation of the FDA-approved Type-1 Diabetes Mellitus simulator (T1DMS)" and provides a link: "Jinyu Xie. Simglucose v0.2.1 (2018) [Online]. Available: https://github.com/jxx123/simglucose. Accessed on: 07-24-2021." However, this refers to a third-party tool used, not the authors' own implementation code for the methodology described in this paper.
Open Datasets	Yes	Real-world: HIV Data. We identiﬁed individuals between 18 72 years of age from the Eu Resist database (Zazzi et al., 2012) comprising of genotype, phenotype, and clinical information of over 65, 000 individuals in response to antiretroviral therapy administered between 1983 2018.
Dataset Splits	No	The paper describes data characteristics for synthetic, diabetes, and HIV datasets, such as episode length, number of patients, and aggregation intervals, and mentions using 10000 trajectories for evaluation. However, it does not provide specific train/test/validation split percentages, sample counts, or explicit methodologies for partitioning the data for model training and evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, memory specifications, or cloud computing instance types.
Software Dependencies	Yes	Jinyu Xie. Simglucose v0.2.1 (2018) [Online]. Available: https://github.com/jxx123/simglucose.
Experiment Setup	No	The paper describes the setup of the synthetic and real-world datasets (e.g., episode length 15, 13 glucose states, 25 actions, 100 continuous states). It mentions parameters like 'cost c' and 'threshold τ' and that 'c' should be tuned, but does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs for Q-learning or other models), optimizer settings, or a detailed training configuration table for their methods or baselines.