VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Authors: Mouxiang Chen, Lefei Shen, Zhuo Li, Xiaoyun Joy Wang, Jianling Sun, Chenghao Liu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments reveal intrinsic similarities between images and real-world time series, suggesting that visual models may offer a free lunch for TSF and highlight the potential for future cross-modality research. Our code is publicly available at https://github.com/ Keytoyze/Vision TS. ... Comprehensive evaluations of VISIONTS on large-scale benchmarks across multiple domains demonstrate its significant forecasting performance, surpassing few-shot textbased TSF foundation models and achieving comparable or superior results to zero-shot TS-based models.
Researcher Affiliation Collaboration 1Zhejiang University 2State Street Technology (Zhejiang) Ltd 3Salesforce Research Asia. Correspondence to: Chenghao Liu <EMAIL>, Zhuo Li <EMAIL>.
Pseudocode No The paper describes the methodology in Section 3 and provides a visual representation in Figure 3, but it does not contain any explicitly structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/Keytoyze/VisionTS.
Open Datasets Yes We evaluate our proposed VISIONTS on large-scale benchmarks, including 8 long-term TSF (Zhou et al., 2021), 29 Monash (Godahewa et al., 2021), and 23 GIFT-Eval (Aksu et al., 2024) datasets, spanning diverse domains, frequencies, and multivariates. ... a visual masked autoencoder, pre-trained on the Image Net dataset
Dataset Splits Yes To prevent data leakage, we selected six widely-used datasets from the long-term TSF benchmark that are not included in MOIRAI s pre-training set for evaluation. Since most baselines cannot perform zero-shot forecasting, we report their few-shot results by fine-tuning on the 10% of the individual target datasets. ... We conduct hyperparameter tuning on validation sets to determine the optimal context length L, detailed in Appendix B.1.
Hardware Specification Yes All experiments are conducted using Time-Series-Library (https://github.com/thuml/ Time-Series-Library) and Gluon TS library (Alexandrov et al., 2020) on an NVIDIA A800 GPU.
Software Dependencies No The paper mentions 'Time-Series-Library' and 'Gluon TS library' but does not specify their version numbers.
Experiment Setup Yes We conduct hyperparameter tuning on validation sets to determine the optimal context length L. ... We set the hyperparameters to r = c = 0.4. ... We use an Adam optimizer with a learning rate 0.0001 and a batch size 256 to fine-tune MAE. All experiments are repeated three times. The training epoch is one for all the datasets except Illness, for which we train MAE for 100 epochs with an early stop due to the limited training dataset scale.