reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stabilized Neural Prediction of Potential Outcomes in Continuous Time

Authors: Konstantin Hess, Stefan Feuerriegel

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate through extensive experiments that our SCIP-Net outperforms existing neural methods. ... 5 NUMERICAL EXPERIMENTS Baselines: We now demonstrate the performance of our SCIP-Net against key neural baselines for estimating CAPOs over time (see Table 1). ... Datasets: We use a (i) synthetic dataset based on a tumor growth model (Geng et al., 2017), and a (ii) semi-synthetic dataset based on the MIMIC-III dataset (Johnson et al., 2016). For both datasets, the outcomes are simulated, so that we have access to the ground-truth potential outcomes, which allows for comparing the performance in terms of root mean squared error (RMSE).
Researcher Affiliation	Academia	Konstantin Hess & Stefan Feuerriegel Munich Center for Machine Learning LMU Munich EMAIL
Pseudocode	No	The paper describes the SCIP-Net methodology in Section 4 and its subsections, using prose, mathematical equations, and diagrams (Figure 2), but does not contain any explicitly formatted pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/konstantinhess/SCIP-Net.
Open Datasets	Yes	Datasets: We use a (i) synthetic dataset based on a tumor growth model (Geng et al., 2017), and a (ii) semi-synthetic dataset based on the MIMIC-III dataset (Johnson et al., 2016).
Dataset Splits	Yes	The observed time window for training, validation, and testing is set to τ = 30 days. We generate 1000 observations for training, validation, and testing, respectively. ... In our experiments in Sec. 5, we used 1000 samples for training, validation and testing, respectively.
Hardware Specification	Yes	Runtime: All methods were trained on 1 NVIDIA A100-PCIE-40GB.
Software Dependencies	No	For optimization, we use Adam (Kingma & Ba, 2015). Both TE-CDE (Seedat et al., 2022) and our SCIP-Net used a simple Euler quadrature and linear interpolation for the Neural CDE control path (Morrill et al., 2021). The paper mentions an optimizer (Adam) and numerical methods (Euler quadrature, linear interpolation) but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	In order to ensure a fair comparison of all methods, we close follow hyperparameter tuning as in (Melnychuk et al., 2022) and (Hess et al., 2024a). In particular, we performed a random grid search. Below, we report the tuning grid for each method. Importantly, all methods are only tuned on factual data. For optimization, we use Adam (Kingma & Ba, 2015). ... Table 7: Following (Melnychuk et al., 2022), we let dyxa = dy + dx + da be the overall input size. Further, dz is the hidden representation size of our SCIP-Net, and corresponds to the balanced representation size of TE-CDE (Seedat et al., 2022), CRN (Bica et al., 2020), and CT (Melnychuk et al., 2022), and the LSTM output size of G-Net (Li et al., 2021). (Table 7 lists specific tuning ranges for hyperparameters like Learning rate, Minibatch size, Neural CDE hidden units, Neural CDE dropout rate, Max gradient norm, and Number of epochs for SCIP-Net and other baselines).