HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting

Authors: Feng Shibo, Peilin Zhao, Liu Liu, Pengcheng Wu, Zhiqi Shen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five popular MTS datasets verify the effectiveness of our proposed method. The source code will be released. ... We conducted experiments to evaluate the performance and efficiency of HDT, covering short-term and long-term forecasting as well as robustness to missing values. The evaluation includes 5 real-world benchmarks and 12 baselines.
Researcher Affiliation Collaboration 1College of Computing and Data Science, Nanyang Technological University (NTU), Singapore 2Webank-NTU Joint Research Institute on Fintech, NTU, Singapore 3Tencent AI Lab, Shenzhen, China EMAIL, EMAIL
Pseudocode Yes The training and inference details are shown in Algorithm 1, 2 and 3. Figure 1 provides an overview of the model architecture.
Open Source Code No The source code will be released.
Open Datasets Yes We extensively evaluate the proposed HDT on five real-world benchmarks, covering the mainstream high-dimensional MTS probabilistic forecasting applications, Solar (Lai et al. 2018), Electricity (Lai et al. 2018), Traffic (Salinas et al. 2019), Taxi (Salinas et al. 2019) and Wikipedia (Gasthaus et al. 2019).
Dataset Splits No We sample 100 times to report metrics on the test set. All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs.
Hardware Specification Yes All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs.
Software Dependencies No Our method relies on the ADAM optimizer with initial learning rates of 0.0005 and 0.001, and a batch size of 64 across all datasets.
Experiment Setup Yes Our method relies on the ADAM optimizer with initial learning rates of 0.0005 and 0.001, and a batch size of 64 across all datasets. The history length is fixed at 96, with prediction lengths of {48, 96, 144}. We sample 100 times to report metrics on the test set. All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs.