reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Are Data Embeddings Effective in Time Series Forecasting?

Authors: Reza Nematirad, Anil Pahwa, Balasubramaniam Natarajan

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive ablation studies across fifteen state-of-the-art models on multiple benchmark datasets, we find that removing data embedding layers from many state-of-the-art models does not degrade forecasting performance in many cases, it improves both accuracy and computational efficiency.
Researcher Affiliation	Academia	Reza Nematirad EMAIL Department of Electrical and Computer Engineering Kansas State University Anil Pahwa EMAIL Department of Electrical and Computer Engineering Kansas State University Balasubramaniam Natarajan EMAIL Department of Electrical and Computer Engineering Kansas State University
Pseudocode	No	The paper describes various embedding techniques and architectural modifications but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Tims2D/Data Embedding.
Open Datasets	Yes	We evaluate all models on seven widely used benchmark datasets spanning diverse domains and temporal resolutions: ETTh1, ETTh2 (hourly), ETTm1, and ETTm2 (15-minute) representing electricity transformer temperature data where each timestamp is a 7-dimensional vector, Weather (10-minute meteorological observations with 21 variables per timestamp), Exchange (8-dimensional daily foreign exchange rate vectors), and National Illness (7-dimensional weekly illness case-rate vectors across U.S. regions). These datasets capture diverse temporal patterns, sampling frequencies (10-minute to weekly), and feature dimensions (from 7 to 21) across various domains (Jin et al., 2024).
Dataset Splits	Yes	All input time series are normalized using the mean and standard deviation from the training set. The sequence length is fixed in both embedding settings. For all datasets except National Illness, prediction horizons are H {96, 192, 336, 720}. For National Illness, due to its weekly resolution, we use H {24, 36, 48, 60}. Forecasting accuracy is evaluated using MSE and MAE. ... Table 9: Summary of benchmark datasets used in this study. ... Dimension Train / Val / Test Frequency Duration ... ETTm1 7 (34,465 / 11,521 / 11,521) 15 minutes Jul 2016 Jul 2018
Hardware Specification	Yes	All experiments are conducted on a high-performance Linux workstation equipped with an NVIDIA L40S GPU (46 GB memory), CUDA version 12.9, and dual AMD EPYC 7713 64-core processors (128 threads in total). The system has 1 TB of RAM and runs on Ubuntu with Python 3.10 and PyTorch 2.2.1.
Software Dependencies	Yes	The system has 1 TB of RAM and runs on Ubuntu with Python 3.10 and PyTorch 2.2.1.
Experiment Setup	Yes	All input time series are normalized using the mean and standard deviation from the training set. The sequence length is fixed in both embedding settings. For all datasets except National Illness, prediction horizons are H {96, 192, 336, 720}. For National Illness, due to its weekly resolution, we use H {24, 36, 48, 60}. Forecasting accuracy is evaluated using MSE and MAE. Computational efficiency is assessed through multiple metrics: (1) average training time per epoch, with breakdowns for data loading, forward pass, and backward pass with optimization; (2) GPU memory usage, including both peak allocated and peak reserved memory; and (3) inference latency per sample. All timing metrics are reported in seconds, and memory usage in megabytes (MB).