Robust Spatio-Temporal Centralized Interaction for OOD Learning

Authors: Jiaming Ma, Binwu Wang, Pengkun Wang, Zhengyang Zhou, Xu Wang, Yang Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared with 14 baselines across six datasets, STOP achieves up to 17.01% improvement in generalization performance and 18.44% improvement in inductive learning performance. In this section, we conduct a comprehensive evaluation of the proposed model.
Researcher Affiliation Academia 1University of Science and Technology of China (USTC), Hefei, China 2Suzhou Institute for Advanced Research, USTC, Suzhou, China. First author mail: Jiaming EMAIL. Correspondence to: Binwu Wang and Yang Wang as corresponding authors <EMAIL and EMAIL>.
Pseudocode Yes We have provided the pseudocode of the algorithm in Algorithm 1, where we can observe that STOP makes final predictions based on the temporal component and spatial component. This includes a perturbation process to extract robust knowledge. This perturbation process only occurs in the training phase and we no longer use it in the test phase. We also provide the optimization flow of Gen PU and model parameters in Algorithm 2.
Open Source Code Yes The code is available at https://github.com/Poor Otter Bob/STOP.
Open Datasets Yes We conduct a comprehensive evaluation of our model on six spatio-temporal datasets spanning multiple years across two domains. These datasets include Large ST (Liu et al., 2024b) and PEMSD3-Stream (Chen et al., 2021) in the traffic domain, and Know Air (Wang et al., 2020) in the atmospheric domain. The dataset summary is presented in Table 1.
Dataset Splits Yes The training set comprises the first 60% of data from the initial year dataset, while the following 20% of data is used as the validation set. In each subsequent year, the last 20% of data is designated as the test set. This setup aims to accentuate the temporal distribution difference between the test and training sets, while maintaining a ratio of approximately 6:2:2 for the training, validation, and test sets. Regarding structural shift evaluation, we select a subset of nodes for training and validation. In the test set, we randomly masked 10% of nodes to simulate node disappearance and added 30% of nodes as new nodes to simulate shifts in the graph structure and scale.
Hardware Specification Yes We implement all models using PyTorch framework of Python 3.8.3 and leveraging the Nvidia A100-PCIE-40GB as support, MAE, RMSE, and MAPE are used as metrics for comparison.
Software Dependencies No We implement all models using PyTorch framework of Python 3.8.3 and leveraging the Nvidia A100-PCIE-40GB as support, MAE, RMSE, and MAPE are used as metrics for comparison. Although Python 3.8.3 is mentioned with a version, PyTorch is mentioned only as a 'framework' without a specific version number, thus not fulfilling the requirement of multiple versioned software components.
Experiment Setup Yes We set both the input and prediction windows to 12 in traffic prediction and 24 in atmospheric prediction. Temporal decomposition kernel size ΞΎ is equal to 3 in traffic datasets and 7 in Know Air. The number of Con AU K is set to {8, 24, 32, 64, 8, 4} and the number of Gen PU M is equal to {3, 3, 3, 3, 2, 4} in six datasets in Table 1. The dimensions of embeddings are set to 64. We use 8 heads in multi-head low-rank attention.