Conformalized Survival Analysis for General Right-Censored Data
Authors: Hen Davidov, Shai Feldman, Gil Shamai, Ron Kimmel, Yaniv Romano
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the informativeness and validity of our methods in simulated settings and showcase their practical utility using several real-world datasets. |
| Researcher Affiliation | Academia | Hen Davidov , Shai Feldman , Gil Shamai , Ron Kimmel , Yaniv Romano EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | A formal description of the algorithm for the naive calibration method is given by Algorithm 1. ... A formal description of the focused calibration algorithm is given in Algorithm 2. ... A formal description of the fused calibration method is presented in Algorithm 3. |
| Open Source Code | Yes | A Python implementation of our methods is provided in our github repository. |
| Open Datasets | Yes | We demonstrate the practical utility of our methods by applying them to six real-world datasets: The Northern Alberta Cancer Dataset (NACD) (Haider et al., 2020), Rotterdam & German Breast Cancer Study Group (GBSG), Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT) (Kvamme et al., 2019; Katzman et al., 2018), a user churn dataset (Fotso et al., 2019 present), as well as The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) multimodal dataset collection (Tomczak et al., 2015). ... The synthetic data generation function and the processed TCGA-BRCA dataset are available in the github repository. |
| Dataset Splits | Yes | In all experiments, the dataset was split into four parts: 60% for training, 20% for calibration, 10% for validation (used for early stopping), and 10% for testing to evaluate performance. ... The reported performance metrics are evaluated on 50 independent trials, each consisting of newly sampled train, validation, calibration, and test sets of sizes 600, 200, 1000, and 200, respectively. |
| Hardware Specification | Yes | CPU: AMD EPYC 7443 24-Core Processor GPU: NVIDIA RTX A6000 OS: Ubuntu 20.04 |
| Software Dependencies | No | The paper mentions software like 'Deep Surv method (Katzman et al., 2018)', 'pycox package (Kvamme et al., 2019), implemented using a Py Torch MLP regressor', and 'scikit-learn (Pedregosa et al., 2011) to train a Random Forrest Classifiers'. However, specific version numbers for PyTorch or scikit-learn are not provided. |
| Experiment Setup | Yes | In all experiments, we approximate the distribution of T | X using the Deep Surv method (Katzman et al., 2018), implemented in the pycox package (Kvamme et al., 2019), implemented using a Py Torch MLP regressor with Re LU activation, early stopping (triggered after 5 epochs without improvement), and a training cycle of 1000 epochs. The Adam optimizer optimized the model with parameters lr = 1e 3, β1 = 0.9 and β2 = 0.999, a batch size of 256, dropout layers with a rate of p = 0.1, batch normalization layers, and varying configurations of hidden layers, detailed in Table 4. These configurations were selected to be similar to those found in the Py Cox notebooks, with the real-world datasets getting a deeper model to account for their more complex and interconnected nature. Additionally, we employed scikit-learn (Pedregosa et al., 2011) to train a Random Forrest Classifiers with a max depths of 4 and 2, to estimate the weights ˆwτ and the indicator ˆsτ respectively. |