TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting

Authors: Peiyuan Liu, Beiliang Wu, Yifan Hu, Naiqi Li, Tao Dai, Jigang Bao, Shu-Tao Xia

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Time Bridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting. Additionally, Time Bridge demonstrates exceptional performance in financial forecasting on the CSI 500 and S&P 500 indices, further validating its robustness and effectiveness. Additionally, Time Bridge demonstrates exceptional performance in financial forecasting on the CSI 500 and S&P 500 indices, further validating its robustness and effectiveness.
Researcher Affiliation Academia 1Tsinghua Shenzhen International Graduate School 2Shenzhen University. Correspondence to: Tao Dai <EMAIL>, Naiqi Li <EMAIL>.
Pseudocode No The paper describes the Time Bridge framework with components like Patch Embedding, Integrated Attention, Patch Downsampling, and Cointegrated Attention. However, it does not present these components or the overall method in a structured pseudocode or algorithm block. The methodology is explained through descriptive text and diagrams.
Open Source Code Yes Code is available at https://github.com/ Hank0626/Time Bridge.
Open Datasets Yes We conduct long-term forecasting experiments on several widely-used real-world datasets, including the Electricity Transformer Temperature (ETT) dataset with its four subsets (ETTh1, ETTh2, ETTm1, ETTm2) (Wu et al., 2021; Miao et al., 2024a), as well as Weather, Electricity, Traffic, and Solar (Liu et al., 2025a;b). These datasets exhibit strong non-stationary characteristics, detailed in Appendix D. Following previous works (Zhou et al., 2021; Wu et al., 2021), we use Mean Square Error (MSE) and Mean Absolute Error (MAE) as evaluation metrics. We set the input length I to 720 for our method. For other baselines, we adopt the setting that searches for the optimal input length I and other hyperparameters. Details of the metric and the searching process can be found in Appendix C.1 and Appendix F.1.
Dataset Splits Yes Dataset Size denotes the total number of time points in (Train, Validation, Test) split respectively. Prediction Length denotes the future time points to be predicted. Frequency denotes the sampling interval of time points.
Hardware Specification Yes All experiments are implemented in Py Torch (Paszke et al., 2019) and conducted on two NVIDIA RTX 3090 24GB GPUs.
Software Dependencies No All experiments are implemented in Py Torch (Paszke et al., 2019) and conducted on two NVIDIA RTX 3090 24GB GPUs. We use the Adam optimizer (Kingma, 2014) with a learning rate selected from {1e-3, 1e-4, 5e-4}. While PyTorch is mentioned, a specific version number for the library itself is not provided. Adam is an algorithm, not a software dependency with a version number.
Experiment Setup Yes We use the Adam optimizer (Kingma, 2014) with a learning rate selected from {1e-3, 1e-4, 5e-4}. The number of patches N is set accordingly to different datasets. We adopt a hybrid MAE loss that operates in both the time and frequency domains for stable training (Wang et al., 2024a). For additional details on hyperparameter settings and loss function, please refer to the Appendix E. Table 8 details hyperparameter settings for different datasets, including lr, d_model, d_ff, and alpha.