reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach

Authors: Difan Deng, Marius Lindauer

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results on long-term time series forecasting tasks show that our approach can search for lightweight, high-performing forecasting architectures across different forecasting tasks. 1 Introduction ... 6 Experiments In this section, we first demonstrate that DARTS-TS can identify the optimal architecture, which is comparable to many other handcrafted architectures on various datasets. ... We compare our results with the following baselines: Path TST (Nie et al., 2023), Modern TCN (Luo and Wang, 2024), DLinear (Zeng et al., 2023), TSMixer (Chen et al., 2023). i Transformer (Liu et al., 2024b), Autoformer (Wu et al., 2021) and Times Net (Wu et al., 2023). ... The results for long-term forecasting tasks are shown in Table 1. ... While on another benchmark, the PEMS dataset, DARTS-TS outperforms all the other baselines for all the tasks, as shown in Table 2. ... E Ablation Study
Researcher Affiliation	Academia	Difan Deng EMAIL Institute of Artificial Intelligence Leibniz University Hannover Marius Lindauer EMAIL Institute of Artificial Intelligence Leibniz University Hannover L3S Research Center
Pseudocode	No	The paper describes the proposed architecture search space and the searching strategy (DARTS) in detail, but it does not include any explicit pseudocode blocks or algorithms.
Open Source Code	Yes	Our code can be found on https://github.com/automl/One Shot Forecasting NAS.git.
Open Datasets	Yes	We evaluate our architecture search framework on the popular long-term forecasting datasets introduced by Wu et al. (2021) and Zeng et al. (2023): Weather, Traffic, Exchange Rate (Exchange), Electricity (ECL), and four ETT datasets. Additionally, we evaluate our approach on the four PEMS datasets (Chen et al., 2001) that record the public traffic network data in California. ... The training/validation/test splits of each dataset follow the rules defined in the Time Series Library6 ... 6https://github.com/thuml/Time-Series-Library
Dataset Splits	Yes	During the search phase, we divide the official training/validation splits of the raw dataset into training and validation sets of the same size, such that they do not overlap with the test sets. The training set and validation sets are then used to optimize the architecture weights and architecture parameters, respectively. We follow the common practice for the training-validation split in time series forecasting tasks: the validation set is located at the tail of the training set. The first half of the dataset is considered the training set that is used to optimize the weights of the supernet, while the second part of the dataset is the validation set, which is used to optimize the architecture parameters. ... The training/validation/test splits of each dataset follow the rules defined in the Time Series Library6, where a fixed ratio of the series is described as test sets, i.e., each test set might contain more time steps than the required forecasting horizons. i.e., we perform rolling origin evaluation by continuously moving the forecasting origin until we iterate through the entire test set.
Hardware Specification	Yes	We run our experiments on a Cluster equipped with Nvidia A100 40 GB GPUs and AMD Milan 7763 CPUs. ... This experiment is executed on one single Nvidia 2080 TI GPU with 11 GB GPU RAM8. ... 8However, we search the one-shot model on an Nvidia A100 GPU with 40 GB GPU RAM.
Software Dependencies	No	Here, we use the vanilla Transformer implemented in Py Torch (Paszke et al., 2019) with the hidden size to be 4 dmodel.
Experiment Setup	Yes	During the search phase, we divide the official training/validation splits of the raw dataset into training and validation sets of the same size, such that they do not overlap with the test sets. ... The concrete hyperparameter settings during search and evaluation phases are presented in Table 5 ... Table 5: Training Hyparameters (HP) during searching and training phases ... All the look-back window sizes are set to 336 (for long-term forecasting tasks) and 96 (for PEMS tasks) for both the searching and testing phases. ... For the big dataset, we set the number of Seq Net hidden dimensions as 32, while this value for the small dataset is 8. This approach is also applied to the hidden NBATS dimension of Flat Net: we set this value as 256 for the big datasets and 96 for the small datasets.