Retrieval Augmented Time Series Forecasting
Authors: Sungwon Han, Seungeon Lee, Meeyoung Cha, Sercan O Arik, Jinsung Yoon
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluations on ten benchmark datasets show that RAFT consistently outperforms contemporary baselines with an average win ratio of 86%. |
| Researcher Affiliation | Collaboration | 1Department of AI Convergence, GIST, Gwangju, South Korea. This work is done while the author was in KAIST. 2School of Computing, KAIST, Daejeon, South Korea 3Data Science for Humanity Group, Max Planck Institute for Security and Privacy, Bochum, Germany 4Google Cloud AI, Sunnyvale, United States. Correspondence to: Jinsung Yoon <EMAIL>. |
| Pseudocode | No | The paper describes algorithms using mathematical equations and textual descriptions (e.g., Eq. 1-9), but does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block format. |
| Open Source Code | Yes | 1Code is in https://github.com/archon159/RAFT |
| Open Datasets | Yes | We consider ten different benchmark datasets, each with a diverse range of variates, dataset lengths, and frequencies: (1-4) The ETT dataset... (5) The Electricity dataset records household electric power consumption over approximately 4 years (Trindade, 2015); (6) The Exchange dataset includes the daily exchange rates of eight countries over 27 years (1990 2016) (Lai et al., 2018); (7) The Illness dataset includes the weekly ratio of patients with influenza-like illness over 20 years (2002-2021)4; (8) The Solar dataset contains 10-minute solar power forecasts collected from power plants in 2006 (Liu et al., 2022a); (9) The Traffic dataset contains hourly road occupancy rates on freeways over 48 months5; and (10) The Weather dataset consists of 21 weather-related indicators in Germany over one year6. |
| Dataset Splits | Yes | The dataset size is presented in (Train, Validation, Test). The detailed information of each dataset are shown in Table 5. Table 5. Basic information of datasets used for evaluation. Dataset # of variates Dataset Size Frequency ETTh1 7 (8449, 2785, 2785) Hourly |
| Hardware Specification | Yes | For all experiments, the average results from three runs are reported, with each experiment conducted on a single NVIDIA A100 40GB GPU. |
| Software Dependencies | No | For implementation, we referred to the publicly available time-series repository (TSLib). The paper does not provide specific version numbers for software dependencies like Python, PyTorch, or the TSLib library itself. |
| Experiment Setup | Yes | RAFT employs the retrieval module with following detailed settings. The periods are set to {1, 2, 4} (n = 3), following existing literature (Wang et al., 2024), and the temperature τ is set to 0.1. Batch size is set to 32. The initial learning rate, the number of patches used in the retrieval (m), and the size of the look-back window (L) are determined via grid search based on performance on the validation set, following the prior work (Wang et al., 2024). For fair comparison, hyper-parameter tuning was performed for both our model and all baselines using the validation set. The learning rate is chosen from 1e-5 to 0.05, look back window size from {96, 192, 336, 720}, and the number of patches used in retrieval m from {1, 5, 10, 20}. |