Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach

Authors: Difan Deng, Marius Lindauer

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results on long-term time series forecasting tasks show that our approach can search for lightweight, high-performing forecasting architectures across different forecasting tasks. 1 Introduction ... 6 Experiments In this section, we first demonstrate that DARTS-TS can identify the optimal architecture, which is comparable to many other handcrafted architectures on various datasets. ... We compare our results with the following baselines: Path TST (Nie et al., 2023), Modern TCN (Luo and Wang, 2024), DLinear (Zeng et al., 2023), TSMixer (Chen et al., 2023). i Transformer (Liu et al., 2024b), Autoformer (Wu et al., 2021) and Times Net (Wu et al., 2023). ... The results for long-term forecasting tasks are shown in Table 1. ... While on another benchmark, the PEMS dataset, DARTS-TS outperforms all the other baselines for all the tasks, as shown in Table 2. ... E Ablation Study
Researcher Affiliation Academia Difan Deng EMAIL Institute of Artificial Intelligence Leibniz University Hannover Marius Lindauer EMAIL Institute of Artificial Intelligence Leibniz University Hannover L3S Research Center
Pseudocode No The paper describes the proposed architecture search space and the searching strategy (DARTS) in detail, but it does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes Our code can be found on https://github.com/automl/One Shot Forecasting NAS.git.
Open Datasets Yes We evaluate our architecture search framework on the popular long-term forecasting datasets introduced by Wu et al. (2021) and Zeng et al. (2023): Weather, Traffic, Exchange Rate (Exchange), Electricity (ECL), and four ETT datasets. Additionally, we evaluate our approach on the four PEMS datasets (Chen et al., 2001) that record the public traffic network data in California. ... The training/validation/test splits of each dataset follow the rules defined in the Time Series Library6 ... 6https://github.com/thuml/Time-Series-Library
Dataset Splits Yes During the search phase, we divide the official training/validation splits of the raw dataset into training and validation sets of the same size, such that they do not overlap with the test sets. The training set and validation sets are then used to optimize the architecture weights and architecture parameters, respectively. We follow the common practice for the training-validation split in time series forecasting tasks: the validation set is located at the tail of the training set. The first half of the dataset is considered the training set that is used to optimize the weights of the supernet, while the second part of the dataset is the validation set, which is used to optimize the architecture parameters. ... The training/validation/test splits of each dataset follow the rules defined in the Time Series Library6, where a fixed ratio of the series is described as test sets, i.e., each test set might contain more time steps than the required forecasting horizons. i.e., we perform rolling origin evaluation by continuously moving the forecasting origin until we iterate through the entire test set.
Hardware Specification Yes We run our experiments on a Cluster equipped with Nvidia A100 40 GB GPUs and AMD Milan 7763 CPUs. ... This experiment is executed on one single Nvidia 2080 TI GPU with 11 GB GPU RAM8. ... 8However, we search the one-shot model on an Nvidia A100 GPU with 40 GB GPU RAM.
Software Dependencies No Here, we use the vanilla Transformer implemented in Py Torch (Paszke et al., 2019) with the hidden size to be 4 dmodel.
Experiment Setup Yes During the search phase, we divide the official training/validation splits of the raw dataset into training and validation sets of the same size, such that they do not overlap with the test sets. ... The concrete hyperparameter settings during search and evaluation phases are presented in Table 5 ... Table 5: Training Hyparameters (HP) during searching and training phases ... All the look-back window sizes are set to 336 (for long-term forecasting tasks) and 96 (for PEMS tasks) for both the searching and testing phases. ... For the big dataset, we set the number of Seq Net hidden dimensions as 32, while this value for the small dataset is 8. This approach is also applied to the hidden NBATS dimension of Flat Net: we set this value as 256 for the big datasets and 96 for the small datasets.