Empirical Risk Minimization under Random Censorship

Authors: Guillaume Ausset, Stephan Clémençon, François Portier

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed. ... Numerical Experiments. Beyond the theoretical generalization guarantees established in the previous section, we now examine at length the performance of the predictive approach proposed in the context of regression based on censored data from an empirical perspective. We present various experiments using both synthetic and real data, and compare it to alternative methods documented in the survival analysis literature standing as natural competitors.
Researcher Affiliation Collaboration Guillaume Ausset EMAIL LTCI, T el ecom Paris, Institut Polytechnique de Paris BNP Paribas. Stephan Cl emen con EMAIL LTCI, T el ecom Paris, Institut Polytechnique de Paris. Fran cois Portier EMAIL CREST UMR 9194, ENSAI, Univ Rennes.
Pseudocode No The paper describes mathematical derivations, theoretical propositions, and experimental results, but does not include any clearly labeled pseudocode or algorithm blocks. Methods are described narratively.
Open Source Code Yes All the experiments and figures displayed in this article can be reproduced using the code available at https://github.com/aussetg/ipcw.
Open Datasets Yes The performance of the IPCW risk minimization approach is now investigated on the TCGA Cancer data (Grossman et al., 2016) using solely the RNA transcriptomes as informative variables.
Dataset Splits Yes The results are depicted in Fig.1 for various sizes of the (censored) training sample and different censoring levels, the prediction error being evaluated by means of a test (uncensored) sample of size 5000. ... All models are trained on n = 8080 patients with a censoring rate of 18%, we measure on the remaining 1449 observed patients the prediction error.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions using certain software libraries.
Software Dependencies No The IPCW versions of the machine learning techniques for regression considered in these experiments, corresponding to the approach studied in the present article, have been implemented with Scikit-Learn (Pedregosa et al., 2011), combined with our own implementation of the Lo O IPCW predictor we propose. For the survival machine learning methods mentioned above, we use the reference implementations of the Scikit Survival package (P olsterl, 2020). The canonical implementation of Ishwaran and Kogalur (2007) is used for Random Survival Forest. While software libraries are mentioned, specific version numbers for these libraries are not provided in the text.
Experiment Setup Yes Calibration of SˆC(|x). In order to fully specify the estimator SˆC(|x), it may be necessary to choose specific hyperparameters. ... For SˆKernC and Sˆ(i)KernC, we consider mˆh(x) the nonparametric kernel regression of Y w.r.t. X, known as the Nadaraya-Watson estimator, and the surrogate loss E[|Y −mˆh(X)|2] which is then minimized using cross-validation with respect to h. In this way, a value for the bandwidth parameter hcv is obtained and might be used in SˆKernC and Sˆ(i)KernC. This approach is also easily applied to set the number of neighbours involved in Sˆ(i)KNNC. ... Consequently, we use h = 5hcv in the following experiments. For SˆRFC, given the large number of hyperparameters, the default parameters selected by the package’s authors have been used.