reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TSMixer: An All-MLP Architecture for Time Series Forecast-ing

Authors: Si-An Chen, Chun-Liang Li, Sercan O Arik, Nathanael Christian Yoder, Tomas Pfister

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate TSMixer on seven popular multivariate long-term forecasting benchmarks and a large-scale real-world retail dataset, M5 (Makridakis et al., 2022). The long-term forecasting datasets cover various applications such as weather, electricity, and traﬃc, and are comprised of multivariate time series without auxiliary information. The M5 dataset is for the competition task of predicting the sales of various items at Walmart. It is a large scale dataset containing 30,490 time series with static features such as store locations, as well as time-varying features such as campaign information. This complexity renders M5 a more challenging benchmark to explore the potential beneﬁts of cross-variate information and auxiliary features. The statistics of these datasets are presented in Table 2.
Researcher Affiliation	Collaboration	Si-An Chen EMAIL National Taiwan University Google Cloud AI Research Chun-Liang Li EMAIL Google Cloud AI Research Nathanael C. Yoder EMAIL Google Cloud AI Research Sercan Ö. Arık EMAIL Google Cloud AI Research Tomas Pﬁster tpﬁster@google.com Google Cloud AI Research
Pseudocode	No	The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes the architecture and components in text and with diagrams (e.g., Figure 1, Figure 3, Figure 4, Figure 6) and provides mathematical formulae in Appendix B, but these are not formatted as pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation is available at: https://github.com/google-research/google-research/tree/master/ tsmixer.
Open Datasets	Yes	We evaluate TSMixer on seven popular multivariate long-term forecasting benchmarks and a large-scale real-world retail dataset, M5 (Makridakis et al., 2022). The long-term forecasting datasets cover various applications such as weather, electricity, and traﬃc, and are comprised of multivariate time series without auxiliary information. The M5 dataset is for the competition task of predicting the sales of various items at Walmart.
Dataset Splits	Yes	Data partition 12/4/4 (month) 7:2:1 1886/28/28 (day) (Train/Validation/Test) For the long-term forecasting datasets, we follow the settings in recent research (Liu et al., 2022b; Zhou et al., 2022a; Nie et al., 2023). We set the input length L = 512 as suggested in Nie et al. (2023) and evaluate the results for prediction lengths of T = {96, 192, 336, 720}. For the M5 dataset, we mostly follow the data processing from Alexandrov et al. (2020). We consider the prediction length of T = 28 (same as the competition), and set the input length to L = 35.
Hardware Specification	Yes	All models are trained on a single NVIDIA Tesla V100 GPU.
Software Dependencies	No	All models are implemented in Py Torch, except TFT, which is implemented in MXNet. Our implementation is based on Gluon TS. We use TFT and Deep AR provided in Gluon TS, and implement Patch TST, FEDformer, and our TSMixer ourselves. The paper mentions PyTorch, MXNet, and Gluon TS but does not specify their version numbers.
Experiment Setup	Yes	We set the input length L = 512 as suggested in Nie et al. (2023) and evaluate the results for prediction lengths of T = {96, 192, 336, 720}. We use the Adam optimization algorithm (Kingma & Ba, 2015) to minimize the mean square error (MSE) training objective, and consider MSE and mean absolute error (MAE) as the evaluation metrics. We apply reversible instance normalization (Kim et al., 2022) to ensure a fair comparison with the state-of-the-art Patch TST (Nie et al., 2023). For the M5 dataset, we mostly follow the data processing from Alexandrov et al. (2020). We consider the prediction length of T = 28 (same as the competition), and set the input length to L = 35. We optimize log-likelihood of negative binomial distribution as suggested by Salinas et al. (2020). We follow the competition s protocol (Makridakis et al., 2022) to aggregate the predictions at diﬀerent levels and evaluate them using the weighted root mean squared scaled error (WRMSSE). More details about the experimental setup and hyperparameter tuning can be found in Appendices C and E. We train each model with a maximum 100 epochs and do early stopping if the validation loss is not improved after 5 epochs. We train each model with a maximum 300 epochs and employ early stopping if the validation loss is not improved after 30 epochs. We noticed that optimizing other objective function might get signiﬁcantly worse results when evaluate WRMSSE. To obtain more stable results, for all models, we take the top 8 hyperparameter settings based on validation WRMSSE and train them for an additional 4 trials (totaling 5 trials) and select the best hyperparameters based on their mean validation WRMSSE, then report the evaluation results on the test set. The hyperparameter settings can be found in Appendix E.