reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Arrow: Accelerator for Time Series Causal Discovery with Time Weaving

Authors: Yuanyuan Yao, Yuan Dong, Lu Chen, Kun Kuang, Ziquan Fang, Cheng Long, Yunjun Gao, Tianyi Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We applied ARROW to four different types of time series causal discovery algorithms and evaluated it on 25 synthetic and real-world datasets. The results demonstrate that, compared to the original algorithms, ARROW achieves up to 153x speedup while achieving higher accuracy in most cases.
Researcher Affiliation	Academia	1The College of Computer Science, Zhejiang University, Hangzhou 310027, China 2College of Computing and Data Science, Nanyang Technological University, Singapore 3The Department of Computer Science, Aalborg University, Denmark. Correspondence to: Lu Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 Candidates Pruning
Open Source Code	Yes	The source code of ARROW is available 2. https://github.com/Xiangguan Mu/arrow
Open Datasets	Yes	In addition to the synthetic datasets, we also conduct validation on the real-world Dream3 1 dataset. 1https://www.synapse.org/Synapse:syn3033083/files/
Dataset Splits	No	The paper mentions generating synthetic datasets with specific characteristics (e.g., "10 variables and a time length of 1000"), and using predefined ranges for time lags. However, it does not provide explicit training/validation/test splits, percentages, or absolute counts for any of the datasets used.
Hardware Specification	Yes	All methods are executed on a machine equipped with an Intel(R) Core(TM) i9-10900K CPU, boasting 10 cores and a clock speed of 3.70GHz. The system also features an NVIDIA Ge Force RTX 3090 graphics card, equipped with 24GB of video memory.
Software Dependencies	No	The paper references open-source code for baselines (e.g., PCMCI, SURD, NGC, VARLiNGAM) by providing URLs to their repositories, but it does not specify version numbers for these software components or any other underlying libraries/frameworks (like Python, PyTorch, TensorFlow, etc.) used for implementation.
Experiment Setup	Yes	We synthesized a dataset with 10 variables and a time length of 1000. The window size w for time weaving was set to 1. In addition, we conducted experiments with constant time lags and multiple time lags. For the constant time lags, the lags between variables is fixed, and it can be selected from the set {3, 5, 7, 9, 15, 20}. In contrast, the multiple time lags represent varying lags between variables, with the lag value being chosen from the set {3, 5, 7, 9, 15, 20} as the range for the time lags. The experiments on varying time-lagged edges are deferred to Appendix D, while in the main experiments, we set k to {5, 15}.