Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Authors: Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Sundar Rangapuram, Danielle C. Maddix, Bernie Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that Wave Token: i) performs on par or better than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and is competitive with modern deep learning models trained specifically on each dataset; ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics; and iii) easily captures complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Amazon Web Services 3Amazon 4New York University. Correspondence to: Luca Masserano <EMAIL>, Abdul Fatir Ansari <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 False Discovery Rate of Coefficients (FDRC) 1: For each dk,j, compute two-sided p-value pk,j = 2(1 Φ(|dk,j|/σ)) for H(k,j) 0 : dk,j = 0 2: Order pk,j such that p(1) pm 3: Let i0 = arg maxi p(i) (i/m)q, with q being error-rate under H0 4: Let λi0 = σΦ 1(1 pi0/2) 5: Threshold all detail coefficients at level λi0 |
| Open Source Code | Yes | A reference implementation of Wave Token is available at https://github.com/amazon-science/chronos-forecasting/tree/wavetoken. |
| Open Datasets | Yes | We train and evaluate Wave Token on the publicly available datasets comprehensively collected by Ansari et al. (2024). |
| Dataset Splits | Yes | We train and evaluate Wave Token on the publicly available datasets comprehensively collected by Ansari et al. (2024). These span a variety of domains and exhibit diverse properties in terms of size, frequencies and prediction lengths, and can be divided in i) pre-training only: datasets exclusively used for training (13 datasets); ii) in-domain: datasets employed for training, whose validation set is also used for evaluation (Benchmark I, 15 datasets); and iii) zero-shot: datasets used solely for evaluation (Benchmark II, 27 datasets). |
| Hardware Specification | Yes | We pre-train Wave Token with T5 models of four sizes Mini (19.2M), Small (44.5M), Base (199M) and Large (705.8M) for 200K steps on 8 A100 GPUs |
| Software Dependencies | No | For the Seasonal Naive baseline, we relied on the implementation available in the Stats Forecast library (Garza et al., 2022). For the task-specific deep learning models, we used their implementations available in the Gluon TS library (Alexandrov et al., 2020). Finally, we used the corresponding reference implementations for Lag-llama (Rasul et al., 2023), Moirai (Woo et al., 2024), Times FM (Das et al., 2023) and Chronos (Ansari et al., 2024). No specific version numbers for these libraries are provided. |
| Experiment Setup | Yes | The context length of the sequences in each training batch is set to 512 and the prediction length is set to 64. In addition, we adopt the data augmentations techniques introduced by (Ansari et al., 2024): each of the sequences is generated with probability 0.9 from a TSMixup set, which takes convex combinations of different time series, and with probability 0.1 from a synthetic dataset generated from Gaussian Processes based on randomly combined kernels. We pre-train Wave Token with T5 models of four sizes Mini (19.2M), Small (44.5M), Base (199M) and Large (705.8M) for 200K steps. |