Causal Ordering for Structure Learning from Time Series
Authors: Pedro Sanchez, Damian Machlanski, Steven McDonagh, Sotirios A. Tsaftaris
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the approach. Empirical evaluations on synthetic and real-world datasets demonstrate that DOTS outperforms state-of-the-art baselines, offering a scalable and robust approach to temporal causal discovery. On synthetic benchmarks spanning d=3 6 variables, T=200 5,000 samples and up to three lags, DOTS improves mean window-graph F1 from 0.63 (best baseline) to 0.81. On the Causal Time real-world benchmark (Medical, AQI, Traffic; d=20 36), while baselines remain the best on individual datasets, DOTS attains the highest average summary-graph F1 while halving runtime relative to graph-optimisation methods. |
| Researcher Affiliation | Academia | Pedro P. Sanchez EMAIL School of Engineering, University of Edinburgh, UK Damian Machlanski EMAIL School of Engineering, University of Edinburgh, UK Causality in Healthcare AI Hub (CHAI), UK Steven Mc Donagh EMAIL School of Engineering, University of Edinburgh, UK Causality in Healthcare AI Hub (CHAI), UK Sotirios A. Tsaftaris EMAIL School of Engineering, University of Edinburgh, UK Causality in Healthcare AI Hub (CHAI), UK |
| Pseudocode | Yes | Algorithm 1 Estimating Multi-Scale Causal Orderings. |
| Open Source Code | Yes | Code is available at https://github.com/CHAI-UK/DOTS. |
| Open Datasets | Yes | We also perform experiments on datasets closer to real-life complexities. To achieve this, we incorporate Causal Time (Cheng et al., 2024), a realistic benchmark for time series causal discovery. Causal Time provides three datasets: Air Quality Index (AQI), Traffic, and Medical. ... Medical: N=20 vital-sign and chart-event channels extracted from 1000 MIMIC-IV ICU stays, resampled to 2-h resolution (T=600 on average). |
| Dataset Splits | No | The paper describes how synthetic data is generated and how real-world datasets are combined and pre-processed for use (e.g., combining 480 samples of length T=40 into a single dataset of length T=19679). However, it does not explicitly provide information on dataset splits such as training, validation, or test sets with specific percentages, counts, or references to predefined splits for model learning. The evaluation compares predicted edges to ground truth, implying the entire dataset is used for analysis without explicit splits for learning. |
| Hardware Specification | No | The paper includes experimental results and runtime analysis (e.g., Figure 8 showing average runtime on synthetic data), but it does not provide specific details about the hardware used to run these experiments (e.g., GPU models, CPU types, or memory configurations). |
| Software Dependencies | No | Table 4 provides a "Summary of source code used to run the methods in the experiments", listing the methods and their respective GitHub repositories. However, it does not specify version numbers for any of these libraries, underlying programming languages (e.g., Python), or other essential software dependencies required for reproducibility. |
| Experiment Setup | Yes | Table 3: Summary of hyperparameters of all methods used in the experiments. This table lists specific hyperparameters and their values for various methods, including CAM (alpha = 0.05), SCORE (α = 0.05, ηG = 0.001, ηH = 0.001), TCDF (epochs = 5000, layers = 2, lr = 0.01 kernel_size = 4, dilation = 4, significance = 0.8), Diff AN (steps = 100, nn_depth = 3, batch_size = 1024 early_stop = 300, lr = 0.001), and DOTS (steps = 100, nn_depth = 3, batch_size = 1024 early_stop = 300, lr = 0.001, n_ord = 10). |