Relational Conformal Prediction for Correlated Time Series

Authors: Andrea Cini, Alexander Jenkins, Danilo Mandic, Cesare Alippi, Filippo Maria Bianchi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate COREL across three experimental settings. In the first one (Sec. 5.1), we compare it against state-of-the-art CP methods operating on the residuals produced by different forecasting models. Then, we analyze COREL in a controlled environment (synthetic dataset). Finally, we assess the effectiveness of the procedure described in Sec. 3.4 in adaptively improving the PIs. ... Empirical results show that COREL achieves state-of-the-art performance compared to existing CP approaches for time series in several datasets and under different scenarios.
Researcher Affiliation Collaboration Andrea Cini 1 2 Alexander Jenkins 1 3 Danilo Mandic 3 Cesare Alippi 1 4 Filippo Maria Bianchi 5 6 1IDSIA USI-SUPSI, Universit a della Svizzera italiana 2Swiss National Science Foundation Postdoc Fellow 3Imperial College London 4Politecnico di Milano 5Ui T The Arctic University of Norway 6NORCE Norwegian Research Centre AS. Correspondence to: Andrea Cini <EMAIL>.
Pseudocode No The paper describes methods and procedures in narrative text and mathematical formulations but does not include any clearly labeled pseudocode blocks or algorithms with structured steps.
Open Source Code Yes The code for reproducing the computational experiments is available at https://github.com/andreacini/corel.
Open Datasets Yes We consider the following datasets, each coming from a different application domain: METR-LA from the traffic forecasting literature (Li et al., 2018); a collection of air quality measurements from different Chinese cities (AQI) (Zheng et al., 2015); a collection of energy consumption profiles acquired from smart meters within the CER smart metering project (CER-E) (Commission for Energy Regulation, 2016; Cini et al., 2022). We follow the preprocessing steps of previous works (Li et al., 2018; Wu et al., 2019; Cini et al., 2023b). For the GPVAR dataset, we generate synthetic data with 40,000 timesteps over an undirected network of 60 nodes connected in a community graph structure by following the system model in Eq. 19 (Zambon and Alippi, 2022).
Dataset Splits Yes We adopt 40%/40%/20% splits for training, calibration, and testing, respectively. ... All models were trained by minimizing the MAE loss using the Adam optimizer for 200 epochs with batch size 32, using 40% of the data for training, 40% for calibration, and 20% for testing. We also use the first 25% of the calibration data as a validation set for early stopping.
Hardware Specification Yes Experiments were conducted on a server equipped with AMD EPYC 7513 CPUs and NVIDIA RTX A5000 GPUs.
Software Dependencies No Benchmarks have been developed with Python (Van Rossum and Drake, 2009) and the following open-source libraries: Numpy (Harris et al., 2020); Py Torch (Paszke et al., 2019); Py Torch Lightning (Falcon and The Py Torch Lightning team, 2019); Py Torch Geometric (Fey and Lenssen, 2019); Torch Spatiotemporal (Cini and Marisca, 2022). The provided text mentions software libraries and their respective citation years, but does not specify exact version numbers for Python or any of the listed libraries as required (e.g., PyTorch 1.9 instead of just 'PyTorch').
Experiment Setup Yes For COREL we tuned the number of neurons in the STGNN with a small grid search on 10% of the calibration data. We used the same model selection procedure for CORNN but also tuned the number of GRU layers. For the experiments on real-world data, the model was trained for a maximum of 100 epochs on the calibration set. Each epoch consisted of a maximum of 50 mini-batches of size 64. We used the Adam optimizer (Kingma and Ba, 2015) with an initial learning rate of 0.003 and reduced by 75% every 20 epochs. We used a fixed number of K = 20 neighbors for the graph learning module. ... Base models: We trained three base models (point predictors) for each dataset: a RNN with GRU cells (1 layer with hidden size 32), a decoder-only Transformer (hidden size 32, feed-forward size 64, 2 attention heads, 3 layers, dropout 0.1). All models were trained by minimizing the MAE loss using the Adam optimizer for 200 epochs with batch size 32.