reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sortability of Time Series Data

Authors: Christopher Lohse, Jonas Wahl

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we show that certain characteristics of datasets, such as varsortability Reisach et al. (2021) and R2sortability Reisach et al. (2023), also occur in datasets for autocorrelated stationary time series. We illustrate this empirically using four types of data: simulated data based on SVAR models and Erdős-Rényi graphs, the data used in the 2019 causality-for-climate challenge (Runge et al., 2019), real-world river stream datasets, and real-world data generated by the Causal Chamber of Gamella et al. (2024). To do this, we adapt varand R2-sortability to time series data. We also investigate the extent to which the performance of continuous score-based causal discovery methods goes hand in hand with high sortability. Arguably, our most surprising finding is that the investigated real-world datasets exhibit high varsortability and low R2-sortability indicating that scales may carry a significant amount of causal information.
Researcher Affiliation	Collaboration	Christopher Lohse EMAIL School of Computer Science and Statistics University of Dublin Trinity College IBM Research Europe, Dublin Jonas Wahl EMAIL Deutsches Forschungszentrum für künstliche Intelligenz (DFKI)
Pseudocode	No	The paper describes the 'sortnregress' algorithm in Section 4 using narrative text and numbered steps, but it does not present it in a formally structured pseudocode block or algorithm environment.
Open Source Code	No	We conduct our experiments in Python, using the TIGRAMITE library2 for simulating data with SVAR models. 2https://github.com/jakobrunge/tigramite. The paper mentions a third-party library used for data simulation but does not provide concrete access to the authors' own implementation code for the methodology described in the paper.
Open Datasets	Yes	We illustrate this empirically using four types of data: simulated data based on SVAR models and Erdős-Rényi graphs, the data used in the 2019 causality-for-climate challenge (Runge et al., 2019)1, real-world river stream datasets, and real-world data generated by the Causal Chamber of Gamella et al. (2024)3. 1https://causeme.uv.es/neurips2019/ 3https://github.com/juangamella/causal-chamber
Dataset Splits	No	For each number of nodes we randomly generate 500 different graphs and generate n = 500 samples per graph. The dataset includes simulated and partially simulated datasets of varying complexity, including high-dimensional datasets and non-linear dependencies, with 100, 150, 600 or 1000 realisations for each dataset specification. The paper mentions the number of samples or realisations for data generation but does not specify how these datasets are split into training, validation, or test sets.
Hardware Specification	No	No specific hardware details (such as GPU or CPU models, or memory specifications) used for running the experiments are mentioned in the paper.
Software Dependencies	No	We conduct our experiments in Python, using the TIGRAMITE library2 for simulating data with SVAR models. When assessing the performance of different algorithms across a range of sortability values, the hyperparameters of the DYNOTEARS algorithm are set to λ1 = λ2 = 0.05. The weight threshold is set to 0.1. As a constraint-based comparison algorithm, PCMCI+(Runge, 2020) is run with α = 0.01 and the Par Corr conditional independence test. While Python and the TIGRAMITE library are mentioned, no specific version numbers are provided for Python or any software libraries/tools.
Experiment Setup	Yes	When assessing the performance of different algorithms across a range of sortability values, the hyperparameters of the DYNOTEARS algorithm are set to λ1 = λ2 = 0.05. The weight threshold is set to 0.1. As a constraint-based comparison algorithm, PCMCI+(Runge, 2020) is run with α = 0.01 and the Par Corr conditional independence test. We further use the varsortnregress, R2-sortnregress and random regress algorithms as described in Section 4.