reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Empirical Risk Minimization under Random Censorship

Authors: Guillaume Ausset, Stephan Clémençon, François Portier

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed. ... Numerical Experiments. Beyond the theoretical generalization guarantees established in the previous section, we now examine at length the performance of the predictive approach proposed in the context of regression based on censored data from an empirical perspective. We present various experiments using both synthetic and real data, and compare it to alternative methods documented in the survival analysis literature standing as natural competitors.
Researcher Affiliation	Collaboration	Guillaume Ausset EMAIL LTCI, T el ecom Paris, Institut Polytechnique de Paris BNP Paribas. Stephan Cl emen con EMAIL LTCI, T el ecom Paris, Institut Polytechnique de Paris. Fran cois Portier EMAIL CREST UMR 9194, ENSAI, Univ Rennes.
Pseudocode	No	The paper describes mathematical derivations, theoretical propositions, and experimental results, but does not include any clearly labeled pseudocode or algorithm blocks. Methods are described narratively.
Open Source Code	Yes	All the experiments and ﬁgures displayed in this article can be reproduced using the code available at https://github.com/aussetg/ipcw.
Open Datasets	Yes	The performance of the IPCW risk minimization approach is now investigated on the TCGA Cancer data (Grossman et al., 2016) using solely the RNA transcriptomes as informative variables.
Dataset Splits	Yes	The results are depicted in Fig.1 for various sizes of the (censored) training sample and diﬀerent censoring levels, the prediction error being evaluated by means of a test (uncensored) sample of size 5000. ... All models are trained on n = 8080 patients with a censoring rate of 18%, we measure on the remaining 1449 observed patients the prediction error.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions using certain software libraries.
Software Dependencies	No	The IPCW versions of the machine learning techniques for regression considered in these experiments, corresponding to the approach studied in the present article, have been implemented with Scikit-Learn (Pedregosa et al., 2011), combined with our own implementation of the Lo O IPCW predictor we propose. For the survival machine learning methods mentioned above, we use the reference implementations of the Scikit Survival package (P olsterl, 2020). The canonical implementation of Ishwaran and Kogalur (2007) is used for Random Survival Forest. While software libraries are mentioned, specific version numbers for these libraries are not provided in the text.
Experiment Setup	Yes	Calibration of SˆC(\|x). In order to fully specify the estimator SˆC(\|x), it may be necessary to choose speciﬁc hyperparameters. ... For SˆKernC and Sˆ(i)KernC, we consider mˆh(x) the nonparametric kernel regression of Y w.r.t. X, known as the Nadaraya-Watson estimator, and the surrogate loss E[\|Y −mˆh(X)\|2] which is then minimized using cross-validation with respect to h. In this way, a value for the bandwidth parameter hcv is obtained and might be used in SˆKernC and Sˆ(i)KernC. This approach is also easily applied to set the number of neighbours involved in Sˆ(i)KNNC. ... Consequently, we use h = 5hcv in the following experiments. For SˆRFC, given the large number of hyperparameters, the default parameters selected by the package’s authors have been used.