reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting

Authors: Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, Mingsheng Long

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct evaluations of Timer-XL in three aspects, including (1) supervised training as a task-specific forecaster, (2) large-scale pre-training as a zero-shot forecaster, and (3) assessing the effectiveness of Time Attention and model efficiency. Given that the long-context forecasting paradigm receives less attention in the community, which can be concealed due to the performance saturation on previous benchmarks (Makridakis et al., 2020; Wu et al., 2022), we established new long-context forecasting benchmarks. Detailed experimental configurations are provided in Appendix B.
Researcher Affiliation	Academia	Yong Liu , Guo Qin , Xiangdong Huang, Jianmin Wang, Mingsheng Long B School of Software, BNRist, Tsinghua University, Beijing 100084, China EMAIL EMAIL
Pseudocode	No	The paper describes the methodology using mathematical equations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at this repository: https://github.com/thuml/Timer-XL.
Open Datasets	Yes	We conduct experiments on well-acknowledged benchmarks to evaluate performance of the proposed Timer-XL, which includes (1) ETT (Zhou et al., 2021) [...] (9) GTWSF (Wu et al., 2023) is a dataset collected from the National Centers for Environmental Information (NCEI). [...] (10) UTSD (Liu et al., 2024c) is a multi-domain time series dataset, which includes seven domains with a hierarchy of four volumes. We adopt the largest volume that encompasses 1 billion time points for pre-training.
Dataset Splits	Yes	We follow the same data processing and train-validation-test split protocol used in Times Net (Wu et al., 2022), where the train, validation, and test datasets are divided according to chronological order to prevent data leakage. Detailed dataset descriptions and prediction settings are provided in Table 9.
Hardware Specification	Yes	All the experiments are implemented by Py Torch (Paszke et al., 2019) on NVIDIA A100 Tensor Core GPUs.
Software Dependencies	No	All the experiments are implemented by Py Torch (Paszke et al., 2019) on NVIDIA A100 Tensor Core GPUs. We employ the Adam optimizer (Kingma & Ba, 2014) and MSE loss for model optimization. A specific version number for PyTorch is not provided in the text.
Experiment Setup	Yes	Detailed experimental configurations are provided in Table 11. We employ the Adam optimizer (Kingma & Ba, 2014) and MSE loss for model optimization.