Are Data Embeddings Effective in Time Series Forecasting?

Authors: Reza Nematirad, Anil Pahwa, Balasubramaniam Natarajan

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive ablation studies across fifteen state-of-the-art models on multiple benchmark datasets, we find that removing data embedding layers from many state-of-the-art models does not degrade forecasting performance in many cases, it improves both accuracy and computational efficiency.
Researcher Affiliation Academia Reza Nematirad EMAIL Department of Electrical and Computer Engineering Kansas State University Anil Pahwa EMAIL Department of Electrical and Computer Engineering Kansas State University Balasubramaniam Natarajan EMAIL Department of Electrical and Computer Engineering Kansas State University
Pseudocode No The paper describes various embedding techniques and architectural modifications but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Tims2D/Data Embedding.
Open Datasets Yes We evaluate all models on seven widely used benchmark datasets spanning diverse domains and temporal resolutions: ETTh1, ETTh2 (hourly), ETTm1, and ETTm2 (15-minute) representing electricity transformer temperature data where each timestamp is a 7-dimensional vector, Weather (10-minute meteorological observations with 21 variables per timestamp), Exchange (8-dimensional daily foreign exchange rate vectors), and National Illness (7-dimensional weekly illness case-rate vectors across U.S. regions). These datasets capture diverse temporal patterns, sampling frequencies (10-minute to weekly), and feature dimensions (from 7 to 21) across various domains (Jin et al., 2024).
Dataset Splits Yes All input time series are normalized using the mean and standard deviation from the training set. The sequence length is fixed in both embedding settings. For all datasets except National Illness, prediction horizons are H {96, 192, 336, 720}. For National Illness, due to its weekly resolution, we use H {24, 36, 48, 60}. Forecasting accuracy is evaluated using MSE and MAE. ... Table 9: Summary of benchmark datasets used in this study. ... Dimension Train / Val / Test Frequency Duration ... ETTm1 7 (34,465 / 11,521 / 11,521) 15 minutes Jul 2016 Jul 2018
Hardware Specification Yes All experiments are conducted on a high-performance Linux workstation equipped with an NVIDIA L40S GPU (46 GB memory), CUDA version 12.9, and dual AMD EPYC 7713 64-core processors (128 threads in total). The system has 1 TB of RAM and runs on Ubuntu with Python 3.10 and PyTorch 2.2.1.
Software Dependencies Yes The system has 1 TB of RAM and runs on Ubuntu with Python 3.10 and PyTorch 2.2.1.
Experiment Setup Yes All input time series are normalized using the mean and standard deviation from the training set. The sequence length is fixed in both embedding settings. For all datasets except National Illness, prediction horizons are H {96, 192, 336, 720}. For National Illness, due to its weekly resolution, we use H {24, 36, 48, 60}. Forecasting accuracy is evaluated using MSE and MAE. Computational efficiency is assessed through multiple metrics: (1) average training time per epoch, with breakdowns for data loading, forward pass, and backward pass with optimization; (2) GPU memory usage, including both peak allocated and peak reserved memory; and (3) inference latency per sample. All timing metrics are reported in seconds, and memory usage in megabytes (MB).