reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gradient-based Explanations for Deep Learning Survival Models

Authors: Sophie Hanna Langbein, Niklas Koenen, Marvin N. Wright

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic data show that gradient-based methods capture the magnitude and direction of local and global feature effects, including time dependencies. We introduce Grad SHAP(t), a gradient-based counterpart to Surv SHAP(t), which outperforms Surv SHAP(t) and Surv LIME in a computational speed vs. accuracy trade-off. Finally, we apply these methods to medical data with multi-modal inputs, revealing relevant tabular features and visual patterns, as well as their temporal dynamics. 5. Experiments
Researcher Affiliation	Academia	1Leibniz Institute for Prevention Research and Epidemiology BIPS, Bremen, Germany 2Faculty of Mathematics and Computer Science, University of Bremen, Germany 3Department of Public Health, University of Copenhagen, Denmark. Correspondence to: Marvin N. Wright <EMAIL>.
Pseudocode	No	The paper describes methods and mathematical representations but does not include any clearly labeled pseudocode or algorithm blocks. The information is insufficient to classify it as containing pseudocode.
Open Source Code	Yes	All methods and visualization tools are implemented in our open-source R package survinng1, which supports torch-based survival models from survivalmodels (Sonabend, 2024) and PyTorch models trained in pycox (Kvamme et al., 2019). 1https://github.com/bips-hb/survinng. All simulations and real-data examples presented in this manuscript, along with the corresponding code, are available in our GitHub repository at https://github.com/bips-hb/Survival-XAI-ICML/ to ensure transparency and reproducibility.
Open Datasets	Yes	Finally, we apply these methods to medical data with multi-modal inputs, revealing relevant tabular features and visual patterns, as well as their temporal dynamics. [...] using the methods to a CNN-based extension of a Deep Hit model trained on a real-world multi-modal medical dataset predicting overall survival in diffuse gliomas (Mobadersany et al., 2018). [...] from The Cancer Genome Atlas (TCGA) Lower-Grade Glioma (LGG) and Glioblastoma (GBM) projects. The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Dataset Splits	Yes	The data consist of N = 10,000 observations simulated from a standard Cox PH model. [...] we split the data into training (9, 500 observations) and test set (500 observations) and fit a Deep Surv (Katzman et al., 2018), a Cox Time (Kvamme et al., 2019), and a Deep Hit (Lee et al., 2018) model to the training set. (Section 5.1.1). This multi-modal model is trained on a total of 1,239 training samples and evaluated on 266 test samples (Section 5.3).
Hardware Specification	Yes	A 64-bit Linux platform running Ubuntu 22.04 LTS with two AMD EPYC Genoa 9534 64-Core Processors (128 cores, 256 threads total), 1.5 terabytes RAM, and eight NVIDIA RTX 6000 Ada Generation GPUs (each with 48 GB memory) was used for all computations. (Appendix A.4 Computational Details)
Software Dependencies	Yes	All methods and visualization tools are implemented in our open-source R package survinng1, which supports torch-based survival models from survivalmodels (Sonabend, 2024) and PyTorch models trained in pycox (Kvamme et al., 2019). [1]https://github.com/bips-hb/survinng ... survivalmodels: Models for Survival Analysis, 2024. URL https://CRAN.R-project.org/package=survivalmodels. R package version 0.1.191.
Experiment Setup	Yes	For all experiments, we use our R package survinng4. [...] and fit a Deep Surv (Katzman et al., 2018), a Cox Time (Kvamme et al., 2019), and a Deep Hit (Lee et al., 2018) model to the training set. [...] using 500 epochs, early stopping, a batch size of 1,024 and a dropout probability of 0.1 applied to all layers. For any other hyperparameters, including the activations the default values set in the pycox (Kvamme et al., 2019) Python package are used (Section 5.1.1).