reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Augmentation Policy Search for Long-Term Forecasting

Authors: Liran Nochumsohn, Omri Azencot

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on challenging univariate and multivariate forecasting benchmark problems demonstrate that TSAA consistently outperforms several robust baselines, suggesting its potential integration into prediction pipelines. Code is available at this repository: https://github.com/azencot-group/TSAA.
Researcher Affiliation	Academia	Liran Nochumsohn EMAIL Department of Computer Science Ben-Gurion University of the Negev Omri Azencot EMAIL Department of Computer Science Ben-Gurion University of the Negev
Pseudocode	Yes	Algorithm 1 Time-Series Auto Augment (TSAA)
Open Source Code	Yes	Code is available at this repository: https://github.com/azencot-group/TSAA.
Open Datasets	Yes	For each of the given baseline models, we apply TSAA on six commonly-used datasets in the literature of long-term time-series forecasting: (1) ETTm2 (Zhou et al., 2021) contains electricity transformer oil temperature data alongside 6 power load features. (2) Electricity (Zhou et al., 2021) is a collection of hourly electricity consumption data over the span of 2 years. (3) Exchange (Lai et al., 2018) consists of 17 years of daily foreign exchange rate records representing different currency pairs. (4) Traffic (Zhou et al., 2021) is an hourly reported sensor data containing information about road occupancy rates. (5) Weather (Zhou et al., 2021) contains 21 different meteorological measurements, recorded every 10 minutes for an entire year. (6) ILI (Wu et al., 2021) includes weekly recordings of influenza-like illness patients.
Dataset Splits	No	The paper mentions input lengths and forecast horizons (e.g., "input length is 96 and the evaluated forecast horizon corresponds to 96, 192, 336, or 720"), but does not explicitly provide information on how the datasets are split into training, validation, and test sets by percentages or sample counts within the paper's text. It refers to a "similar setup to (Wu et al., 2021; Zhou et al., 2022)" which implies using standard splits from those works, but does not explicitly state them here.
Hardware Specification	Yes	The Pytorch library (Paszke et al., 2019) is used for all model implementations, and executed with NVIDIA Ge Force RTX 3090 24GB.
Software Dependencies	No	The paper mentions "The Pytorch library (Paszke et al., 2019)", "ADAM optimizer (Kingma & Ba, 2015)", and "Optuna (Akiba et al., 2019)" but does not specify particular version numbers for these software dependencies.
Experiment Setup	Yes	The model weights are optimized with respect to the mean squared error (MSE) using the ADAM optimizer (Kingma & Ba, 2015) with an initial learning rate of 10-3 for N-BEATS and 10-4 for Transformer-based models. The maximum number of epochs is set to 10 allowing early-stopping with a patience parameter of 3. ... The number of trials Tmax is set to 100. For TPE, In order to guarantee aggressive exploration at the beginning, we run the first 30% of trials with random search. For ASHA, r and η are set globally to 1 and 3 respectively. The maximum resource parameter R, representing the epochs, is set differently for each experiment, due to the baseline s early-stopping. ... a maximum of k best policies are selected to obtain pθ , where k = 3.