reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Doubly Robust Conformalized Survival Analysis with Right-Censored Data

Authors: Matteo Sesia, Vladimir Svetnik

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies on simulated and real data demonstrate that our method leads to relatively informative predictive inferences and is especially robust in challenging settings where the survival model may be inaccurate. ... 4. Numerical Experiments ... 5. Application to Real Data
Researcher Affiliation	Collaboration	1Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA, USA. 2Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, USA. 3Merck&Co., Inc., Rahway, NJ, USA. Correspondence to: Matteo Sesia <EMAIL>.
Pseudocode	Yes	Algorithm 1 Imputation of Latent Censoring Times ... Algorithm 2 DR-COSARC with Fixed Cutoffs ... Algorithm 3 DR-COSARC with Adaptive Cutoffs
Open Source Code	Yes	Software Availability A software implementation of the methods described in this paper is available online at https://github.com/msesia/conformal_survival.
Open Datasets	Yes	We apply our method to seven publicly available datasets: VALCT, PBC, GBSG, METABRIC, COLON, HEART, and RETINOPATHY. These datasets cover a range of study designs and sizes; Table A3 in Appendix A4 provides details on the number of observations, covariates, and data sources. ... The datasets were obtained from various publicly available sources. VALCT, PBC, COLON, HEART, and RETINOPATHY are included in the survival R package. GBSG was sourced from Git Hub: https://github.com/jaredleekatzman/Deep Surv/. METABRIC was accessed via https://www.cbioportal.org/study/summary?id=brca_metabric.
Dataset Splits	Yes	We generate independent training, calibration, and test datasets, each with 1000 samples. ... The datasets are split into 60% for training, 20% for calibration, and 20% for testing, and each experiment is repeated 100 times using independent random splits.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It discusses computational cost but without specifying the underlying hardware.
Software Dependencies	No	The paper lists several R packages used for modeling (grf, survival, randomForestSRC), but it does not specify their version numbers or the version of R itself, which is required for reproducible software dependencies.
Experiment Setup	Yes	We compute 90% survival LPBs for the test set. Performance is evaluated by the average proportion of test points where the true survival time exceeds the LPB (targeting 90%) and the average LPB value... All experiments are repeated 100 times, and results are averaged. ... we always set equal to the median of the observed censoring times.