HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting
Authors: Feng Shibo, Peilin Zhao, Liu Liu, Pengcheng Wu, Zhiqi Shen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five popular MTS datasets verify the effectiveness of our proposed method. The source code will be released. ... We conducted experiments to evaluate the performance and efficiency of HDT, covering short-term and long-term forecasting as well as robustness to missing values. The evaluation includes 5 real-world benchmarks and 12 baselines. |
| Researcher Affiliation | Collaboration | 1College of Computing and Data Science, Nanyang Technological University (NTU), Singapore 2Webank-NTU Joint Research Institute on Fintech, NTU, Singapore 3Tencent AI Lab, Shenzhen, China EMAIL, EMAIL |
| Pseudocode | Yes | The training and inference details are shown in Algorithm 1, 2 and 3. Figure 1 provides an overview of the model architecture. |
| Open Source Code | No | The source code will be released. |
| Open Datasets | Yes | We extensively evaluate the proposed HDT on five real-world benchmarks, covering the mainstream high-dimensional MTS probabilistic forecasting applications, Solar (Lai et al. 2018), Electricity (Lai et al. 2018), Traffic (Salinas et al. 2019), Taxi (Salinas et al. 2019) and Wikipedia (Gasthaus et al. 2019). |
| Dataset Splits | No | We sample 100 times to report metrics on the test set. All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs. |
| Hardware Specification | Yes | All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs. |
| Software Dependencies | No | Our method relies on the ADAM optimizer with initial learning rates of 0.0005 and 0.001, and a batch size of 64 across all datasets. |
| Experiment Setup | Yes | Our method relies on the ADAM optimizer with initial learning rates of 0.0005 and 0.001, and a batch size of 64 across all datasets. The history length is fixed at 96, with prediction lengths of {48, 96, 144}. We sample 100 times to report metrics on the test set. All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs. |