Double Machine Learning Based Structure Identification from Temporal Data
Authors: Emmanouil Angelis, Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Stefan Bauer
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further perform extensive experiments to showcase the superior performance of our method. (Abstract) In extensive experiments we illustrate that our approach is significantly more robust, significantly faster, and more performative than state-of-the-art baselines. (Section 1, Our contribution, point 4) The paper includes a dedicated '6 Experiments' section with 'Synthetic Experiments' and 'Semi-Synthetic Experiments', reporting AUROC, accuracy, CSI, and F1 scores in tables and figures, which are clear indicators of empirical evaluation. |
| Researcher Affiliation | Academia | Emmanouil Angelis EMAIL Helmholtz AI, Helmholtz Center Munich Technincal University of Munich, Francesco Quinzan EMAIL Department of Engineering Science University of Oxford, Ashkan Soleymani EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology, Patrick Jaillet EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology, Stefan Bauer EMAIL Helmholtz AI, Helmholtz Center Munich Technincal University of Munich. All listed affiliations are universities or public research institutions. |
| Pseudocode | Yes | The paper includes a clearly labeled algorithm block: 'Algorithm 1 The DR-SIT' on page 5, which outlines the steps of the proposed method. |
| Open Source Code | Yes | The abstract explicitly states: 'Code: https://github.com/sdi1100041/TMLR_submission_DR_SIT'. |
| Open Datasets | Yes | The paper evaluates performance with the Dream3 benchmark, citing 'Prill et al., 2010; Marbach et al., 2009', indicating the use of a well-known, publicly available dataset. (Section 6.2) |
| Dataset Splits | Yes | The paper specifies the dataset splitting methodology: 'Cross-fitting scheme. We employ k = 5-fold cross-fitting, splitting trajectories uniformly at random into equal-sized folds an essential step for valid double-machine-learning inference.' (Section 5.4) |
| Hardware Specification | Yes | Table 2 explicitly details the hardware used: '1 NVIDIA A100 GPU + AMD EPYC 7402 24-Core CPU' and '11th Gen Core i5-1140F CPU'. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. The paper mentions using methods like 'kernel ridge regression' and 'MLP model' but does not specify library versions (e.g., scikit-learn, PyTorch versions). |
| Experiment Setup | Yes | The paper provides specific experimental setup details, including lag selection: 'For the synthetic datasets (6.1), we use the ground-truth lag employed in data generation. For DREAM3 (6.2) we fix lag = 2 for every method, following prior work.' (Section 5.4). It also lists hyperparameters for baselines: 'We specifically use the following hyper-parameters for Rhino: Node Embedd. = 16, Instantaneous eff. = False, Node Embedd. (flow) = 16, lag = 2, λs = 19, Auglag = 30. And we use the following for Rhino+g: Node Embedd. = 16, Instantaneous eff. = False, lag = 2, λs = 15, Auglag = 60.' (Section 6.2). |