reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Relational Conformal Prediction for Correlated Time Series

Authors: Andrea Cini, Alexander Jenkins, Danilo Mandic, Cesare Alippi, Filippo Maria Bianchi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate COREL across three experimental settings. In the first one (Sec. 5.1), we compare it against state-of-the-art CP methods operating on the residuals produced by different forecasting models. Then, we analyze COREL in a controlled environment (synthetic dataset). Finally, we assess the effectiveness of the procedure described in Sec. 3.4 in adaptively improving the PIs. ... Empirical results show that COREL achieves state-of-the-art performance compared to existing CP approaches for time series in several datasets and under different scenarios.
Researcher Affiliation	Collaboration	Andrea Cini 1 2 Alexander Jenkins 1 3 Danilo Mandic 3 Cesare Alippi 1 4 Filippo Maria Bianchi 5 6 1IDSIA USI-SUPSI, Universit a della Svizzera italiana 2Swiss National Science Foundation Postdoc Fellow 3Imperial College London 4Politecnico di Milano 5Ui T The Arctic University of Norway 6NORCE Norwegian Research Centre AS. Correspondence to: Andrea Cini <EMAIL>.
Pseudocode	No	The paper describes methods and procedures in narrative text and mathematical formulations but does not include any clearly labeled pseudocode blocks or algorithms with structured steps.
Open Source Code	Yes	The code for reproducing the computational experiments is available at https://github.com/andreacini/corel.
Open Datasets	Yes	We consider the following datasets, each coming from a different application domain: METR-LA from the traffic forecasting literature (Li et al., 2018); a collection of air quality measurements from different Chinese cities (AQI) (Zheng et al., 2015); a collection of energy consumption profiles acquired from smart meters within the CER smart metering project (CER-E) (Commission for Energy Regulation, 2016; Cini et al., 2022). We follow the preprocessing steps of previous works (Li et al., 2018; Wu et al., 2019; Cini et al., 2023b). For the GPVAR dataset, we generate synthetic data with 40,000 timesteps over an undirected network of 60 nodes connected in a community graph structure by following the system model in Eq. 19 (Zambon and Alippi, 2022).
Dataset Splits	Yes	We adopt 40%/40%/20% splits for training, calibration, and testing, respectively. ... All models were trained by minimizing the MAE loss using the Adam optimizer for 200 epochs with batch size 32, using 40% of the data for training, 40% for calibration, and 20% for testing. We also use the first 25% of the calibration data as a validation set for early stopping.
Hardware Specification	Yes	Experiments were conducted on a server equipped with AMD EPYC 7513 CPUs and NVIDIA RTX A5000 GPUs.
Software Dependencies	No	Benchmarks have been developed with Python (Van Rossum and Drake, 2009) and the following open-source libraries: Numpy (Harris et al., 2020); Py Torch (Paszke et al., 2019); Py Torch Lightning (Falcon and The Py Torch Lightning team, 2019); Py Torch Geometric (Fey and Lenssen, 2019); Torch Spatiotemporal (Cini and Marisca, 2022). The provided text mentions software libraries and their respective citation years, but does not specify exact version numbers for Python or any of the listed libraries as required (e.g., PyTorch 1.9 instead of just 'PyTorch').
Experiment Setup	Yes	For COREL we tuned the number of neurons in the STGNN with a small grid search on 10% of the calibration data. We used the same model selection procedure for CORNN but also tuned the number of GRU layers. For the experiments on real-world data, the model was trained for a maximum of 100 epochs on the calibration set. Each epoch consisted of a maximum of 50 mini-batches of size 64. We used the Adam optimizer (Kingma and Ba, 2015) with an initial learning rate of 0.003 and reduced by 75% every 20 epochs. We used a fixed number of K = 20 neighbors for the graph learning module. ... Base models: We trained three base models (point predictors) for each dataset: a RNN with GRU cells (1 layer with hidden size 32), a decoder-only Transformer (hidden size 32, feed-forward size 64, 2 attention heads, 3 layers, dropout 0.1). All models were trained by minimizing the MAE loss using the Adam optimizer for 200 epochs with batch size 32.