TSMixer: An All-MLP Architecture for Time Series Forecast-ing

Authors: Si-An Chen, Chun-Liang Li, Sercan O Arik, Nathanael Christian Yoder, Tomas Pfister

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TSMixer on seven popular multivariate long-term forecasting benchmarks and a large-scale real-world retail dataset, M5 (Makridakis et al., 2022). The long-term forecasting datasets cover various applications such as weather, electricity, and traffic, and are comprised of multivariate time series without auxiliary information. The M5 dataset is for the competition task of predicting the sales of various items at Walmart. It is a large scale dataset containing 30,490 time series with static features such as store locations, as well as time-varying features such as campaign information. This complexity renders M5 a more challenging benchmark to explore the potential benefits of cross-variate information and auxiliary features. The statistics of these datasets are presented in Table 2.
Researcher Affiliation Collaboration Si-An Chen EMAIL National Taiwan University Google Cloud AI Research Chun-Liang Li EMAIL Google Cloud AI Research Nathanael C. Yoder EMAIL Google Cloud AI Research Sercan Ö. Arık EMAIL Google Cloud AI Research Tomas Pfister tpfister@google.com Google Cloud AI Research
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes the architecture and components in text and with diagrams (e.g., Figure 1, Figure 3, Figure 4, Figure 6) and provides mathematical formulae in Appendix B, but these are not formatted as pseudocode or algorithm blocks.
Open Source Code Yes The implementation is available at: https://github.com/google-research/google-research/tree/master/ tsmixer.
Open Datasets Yes We evaluate TSMixer on seven popular multivariate long-term forecasting benchmarks and a large-scale real-world retail dataset, M5 (Makridakis et al., 2022). The long-term forecasting datasets cover various applications such as weather, electricity, and traffic, and are comprised of multivariate time series without auxiliary information. The M5 dataset is for the competition task of predicting the sales of various items at Walmart.
Dataset Splits Yes Data partition 12/4/4 (month) 7:2:1 1886/28/28 (day) (Train/Validation/Test) For the long-term forecasting datasets, we follow the settings in recent research (Liu et al., 2022b; Zhou et al., 2022a; Nie et al., 2023). We set the input length L = 512 as suggested in Nie et al. (2023) and evaluate the results for prediction lengths of T = {96, 192, 336, 720}. For the M5 dataset, we mostly follow the data processing from Alexandrov et al. (2020). We consider the prediction length of T = 28 (same as the competition), and set the input length to L = 35.
Hardware Specification Yes All models are trained on a single NVIDIA Tesla V100 GPU.
Software Dependencies No All models are implemented in Py Torch, except TFT, which is implemented in MXNet. Our implementation is based on Gluon TS. We use TFT and Deep AR provided in Gluon TS, and implement Patch TST, FEDformer, and our TSMixer ourselves. The paper mentions PyTorch, MXNet, and Gluon TS but does not specify their version numbers.
Experiment Setup Yes We set the input length L = 512 as suggested in Nie et al. (2023) and evaluate the results for prediction lengths of T = {96, 192, 336, 720}. We use the Adam optimization algorithm (Kingma & Ba, 2015) to minimize the mean square error (MSE) training objective, and consider MSE and mean absolute error (MAE) as the evaluation metrics. We apply reversible instance normalization (Kim et al., 2022) to ensure a fair comparison with the state-of-the-art Patch TST (Nie et al., 2023). For the M5 dataset, we mostly follow the data processing from Alexandrov et al. (2020). We consider the prediction length of T = 28 (same as the competition), and set the input length to L = 35. We optimize log-likelihood of negative binomial distribution as suggested by Salinas et al. (2020). We follow the competition s protocol (Makridakis et al., 2022) to aggregate the predictions at different levels and evaluate them using the weighted root mean squared scaled error (WRMSSE). More details about the experimental setup and hyperparameter tuning can be found in Appendices C and E. We train each model with a maximum 100 epochs and do early stopping if the validation loss is not improved after 5 epochs. We train each model with a maximum 300 epochs and employ early stopping if the validation loss is not improved after 30 epochs. We noticed that optimizing other objective function might get significantly worse results when evaluate WRMSSE. To obtain more stable results, for all models, we take the top 8 hyperparameter settings based on validation WRMSSE and train them for an additional 4 trials (totaling 5 trials) and select the best hyperparameters based on their mean validation WRMSSE, then report the evaluation results on the test set. The hyperparameter settings can be found in Appendix E.