SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes
Authors: Yishan Shen, Yuyang Ye, Hui Xiong, Yong Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two publicly available sepsis datasets demonstrate that SAFER outperforms state-of-the-art baselines across multiple recommendation metrics and counterfactual mortality rate, while offering robust formal assurances. |
| Researcher Affiliation | Academia | 1University of Pennsylvania 2Rutgers University 3The Hong Kong University of Science and Technology (Guangzhou). Correspondence to: Hui Xiong <EMAIL>, Yong Chen <EMAIL>. |
| Pseudocode | No | The paper describes the methodology in narrative text and mathematical formulas but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code and dataset are avaliable at https://github.com/yishanssss/SAFER. |
| Open Datasets | Yes | Experiments on two publicly available sepsis datasets demonstrate that SAFER outperforms state-of-the-art baselines across multiple recommendation metrics and counterfactual mortality rate, while offering robust formal assurances. For this study, we define cohorts based on the sepsis-3 criteria (Singer et al., 2016), focusing on the early stages of sepsis management 24 hours prior to and 48 hours after sepsis onset. The treatment selection involves intravenous fluid and vasopressor dosage within a 4-hour window, mapped to a 5 5 medical intervention space, following Komorowski et al. (2018). Figure 2 shows the distribution of sepsis treatment co-occurrence in the two cohorts. |
| Dataset Splits | Yes | The two datasets were randomly split into training, calibration (validation), and test sets in an 80%/10%/10% ratio via patient-level splits to ensure no patient overlap, under the assumption that the entire dataset is i.i.d. sampled from a common distribution. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | Specifically, we use Bio Clinical BERT 2 to encode clinical notes, modeled as X W (Alsentzer et al., 2019), which provides superior performance in encoding clinical text due to its bidirectional attention mechanism and domainspecific pretraining on large-scale biomedical and clinical corpora (Huang et al., 2023; Hu et al., 2024; Zhang et al., 2022). Footnote 2: https://huggingface.co/emilyalsentzer/Bio Clinical BERT. While Bio Clinical BERT is mentioned, no specific version number for this or any other software dependency is provided. |
| Experiment Setup | Yes | Appendix C.3 provides a sensitivity analysis of several hyperparameters, including the length of historical information, hidden dimension, and γ in the loss function. Historical sequence length L: ... we set the sequence length to 8 for all experiments. Hidden dimensionality hd: ... we choose 128 to reduce model parameters and improve computational efficiency. γ in the Loss Function: Figure 10 illustrates the performance of SAFER under different γ values, guiding the selection of an optimal γ. |