reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Time-to-Event Prediction with Neural Networks and Cox Regression

Authors: Håvard Kvamme, Ørnulf Borgan, Ida Scheel

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through simulation studies, the proposed loss function is veriﬁed to be a good approximation for the Cox partial log-likelihood. The proposed methodology is compared to existing methodologies on real-world data sets and is found to be highly competitive, typically yielding the best performance in terms of Brier score and binomial log-likelihood. In Section 5, we conduct a simulation study, verifying that the methods we propose behave as expected. In Section 6 we evaluate our methods on ﬁve real-world data sets and compare their performances with existing methodology.
Researcher Affiliation	Academia	H avard Kvamme EMAIL Ørnulf Borgan EMAIL Ida Scheel EMAIL Department of Mathematics University of Oslo P.O. Box 1053 Blindern 0316 Oslo, Norway
Pseudocode	No	The paper describes the methodologies using mathematical formulas and prose, but it does not contain clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	A python package for the proposed methods is available at https://github.com/havakv/pycox. Implementations of methods and the data sets are available at https://github.com/havakv/pycox.
Open Datasets	Yes	Implementations of methods and the data sets are available at https://github.com/havakv/pycox. For FLCHAIN, we remove individuals with missing values. Further, we remove the chapter covariate, which gives the cause of death. Table 1 provides a summary of the data sets. For a more detailed description, we refer to the original sources (Therneau, 2015; Katzman et al., 2018). The WSDM KKBox’s churn prediction challenge was proposed... The competition was hosted by Kaggle in 2017, with the goal of predicting customer churn on a data set donated by KKBox... (https://www.kaggle.com/c/kkbox-churn-prediction-challenge)
Dataset Splits	Yes	As the four data sets are somewhat small, we scored our ﬁtted models using 5-fold crossvalidation, where the hyperparameter search was performed individually for each fold. We split the data into a training, a testing, and a validation set, and some information about these subsets are listed in Table 5.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies	No	The paper mentions software like the PyTorch framework (Paszke et al., 2017), the Lifelines python package (Davidson-Pilon et al., 2018), and survival packages of R (Therneau, 2015). However, it does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup	Yes	The networks are standard multi-layer perceptrons with the same number of nodes in every layer, Re LU activations, and batch normalization between layers. We used dropout, normalized decoupled weight decay (Loshchilov and Hutter, 2019), and early stopping for regularization. SGD was performed by Adam WR (Loshchilov and Hutter, 2019) with an initial cycle length of one epoch, and we double the cycle length after each cycle. Learning rates were found using the methods proposed by Smith (2017). All networks were trained with batch size of 1028, and the best performing architectures can be found in Table 6. For the proposed Cox-MLP (CC) and Cox-Time, we used a ﬁxed penalty λ = 0.001 in (10). Table A.1 and A.2 provide detailed hyperparameter search spaces and chosen values for KKBox.