reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Authors: Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Sundar Rangapuram, Danielle C. Maddix, Bernie Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that Wave Token: i) performs on par or better than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and is competitive with modern deep learning models trained specifically on each dataset; ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics; and iii) easily captures complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Amazon Web Services 3Amazon 4New York University. Correspondence to: Luca Masserano <EMAIL>, Abdul Fatir Ansari <EMAIL>.
Pseudocode	Yes	Algorithm 1 False Discovery Rate of Coefficients (FDRC) 1: For each dk,j, compute two-sided p-value pk,j = 2(1 Φ(\|dk,j\|/σ)) for H(k,j) 0 : dk,j = 0 2: Order pk,j such that p(1) pm 3: Let i0 = arg maxi p(i) (i/m)q, with q being error-rate under H0 4: Let λi0 = σΦ 1(1 pi0/2) 5: Threshold all detail coefficients at level λi0
Open Source Code	Yes	A reference implementation of Wave Token is available at https://github.com/amazon-science/chronos-forecasting/tree/wavetoken.
Open Datasets	Yes	We train and evaluate Wave Token on the publicly available datasets comprehensively collected by Ansari et al. (2024).
Dataset Splits	Yes	We train and evaluate Wave Token on the publicly available datasets comprehensively collected by Ansari et al. (2024). These span a variety of domains and exhibit diverse properties in terms of size, frequencies and prediction lengths, and can be divided in i) pre-training only: datasets exclusively used for training (13 datasets); ii) in-domain: datasets employed for training, whose validation set is also used for evaluation (Benchmark I, 15 datasets); and iii) zero-shot: datasets used solely for evaluation (Benchmark II, 27 datasets).
Hardware Specification	Yes	We pre-train Wave Token with T5 models of four sizes Mini (19.2M), Small (44.5M), Base (199M) and Large (705.8M) for 200K steps on 8 A100 GPUs
Software Dependencies	No	For the Seasonal Naive baseline, we relied on the implementation available in the Stats Forecast library (Garza et al., 2022). For the task-specific deep learning models, we used their implementations available in the Gluon TS library (Alexandrov et al., 2020). Finally, we used the corresponding reference implementations for Lag-llama (Rasul et al., 2023), Moirai (Woo et al., 2024), Times FM (Das et al., 2023) and Chronos (Ansari et al., 2024). No specific version numbers for these libraries are provided.
Experiment Setup	Yes	The context length of the sequences in each training batch is set to 512 and the prediction length is set to 64. In addition, we adopt the data augmentations techniques introduced by (Ansari et al., 2024): each of the sequences is generated with probability 0.9 from a TSMixup set, which takes convex combinations of different time series, and with probability 0.1 from a synthetic dataset generated from Gaussian Processes based on randomly combined kernels. We pre-train Wave Token with T5 models of four sizes Mini (19.2M), Small (44.5M), Base (199M) and Large (705.8M) for 200K steps.