reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Double Machine Learning Based Structure Identification from Temporal Data

Authors: Emmanouil Angelis, Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Stefan Bauer

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further perform extensive experiments to showcase the superior performance of our method. (Abstract) In extensive experiments we illustrate that our approach is significantly more robust, significantly faster, and more performative than state-of-the-art baselines. (Section 1, Our contribution, point 4) The paper includes a dedicated '6 Experiments' section with 'Synthetic Experiments' and 'Semi-Synthetic Experiments', reporting AUROC, accuracy, CSI, and F1 scores in tables and figures, which are clear indicators of empirical evaluation.
Researcher Affiliation	Academia	Emmanouil Angelis EMAIL Helmholtz AI, Helmholtz Center Munich Technincal University of Munich, Francesco Quinzan EMAIL Department of Engineering Science University of Oxford, Ashkan Soleymani EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology, Patrick Jaillet EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology, Stefan Bauer EMAIL Helmholtz AI, Helmholtz Center Munich Technincal University of Munich. All listed affiliations are universities or public research institutions.
Pseudocode	Yes	The paper includes a clearly labeled algorithm block: 'Algorithm 1 The DR-SIT' on page 5, which outlines the steps of the proposed method.
Open Source Code	Yes	The abstract explicitly states: 'Code: https://github.com/sdi1100041/TMLR_submission_DR_SIT'.
Open Datasets	Yes	The paper evaluates performance with the Dream3 benchmark, citing 'Prill et al., 2010; Marbach et al., 2009', indicating the use of a well-known, publicly available dataset. (Section 6.2)
Dataset Splits	Yes	The paper specifies the dataset splitting methodology: 'Cross-fitting scheme. We employ k = 5-fold cross-fitting, splitting trajectories uniformly at random into equal-sized folds an essential step for valid double-machine-learning inference.' (Section 5.4)
Hardware Specification	Yes	Table 2 explicitly details the hardware used: '1 NVIDIA A100 GPU + AMD EPYC 7402 24-Core CPU' and '11th Gen Core i5-1140F CPU'.
Software Dependencies	No	No specific software dependencies with version numbers are provided. The paper mentions using methods like 'kernel ridge regression' and 'MLP model' but does not specify library versions (e.g., scikit-learn, PyTorch versions).
Experiment Setup	Yes	The paper provides specific experimental setup details, including lag selection: 'For the synthetic datasets (6.1), we use the ground-truth lag employed in data generation. For DREAM3 (6.2) we fix lag = 2 for every method, following prior work.' (Section 5.4). It also lists hyperparameters for baselines: 'We specifically use the following hyper-parameters for Rhino: Node Embedd. = 16, Instantaneous eff. = False, Node Embedd. (flow) = 16, lag = 2, λs = 19, Auglag = 30. And we use the following for Rhino+g: Node Embedd. = 16, Instantaneous eff. = False, lag = 2, λs = 15, Auglag = 60.' (Section 6.2).