reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Time-Energy Model: Selective Time-Series Forecasting Using Energy-Based Models

Authors: Jonas Brusokas, Seshu Tirupathi, Dalin Zhang, Torben Bach Pedersen

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results indicate that TEM generalizes well across 5 state-of-the-art deterministic time-series forecasting models and 5 benchmark time-series forecasting datasets. Using selective forecasting, TEM reduces prediction error by up to 49.1% over 5 state-of-the-art deterministic models. Furthermore, TEM has up to 87.0% lower error than selected baseline EBM models, and achieves significantly better performance than state-of-the-art selective deep learning models.
Researcher Affiliation	Collaboration	Jonas Brusokas EMAIL Department of Computer Science Aalborg University Seshu Tirupathi EMAIL IBM Research, Ireland Dalin Zhang EMAIL Department of Computer Science Aalborg University Torben Bach Pedersen EMAIL Department of Computer Science Aalborg University
Pseudocode	Yes	A detailed description of the CD training method is provided in Appendix Algorithm 1. Algorithm 1 Calculating Contrastive Divergence (CD) loss
Open Source Code	Yes	Code for the proposed TEM framework is available at https://github.com/Jonas Brusokas/Time-Energy-Model.
Open Datasets	Yes	To evaluate TEM performance, we use five open benchmark time-series datasets. The Electricity Transformer Temperature datasets: ETTh1, ETTh2 (Zhou et al., 2021) contain 2 years of hourly temperature measurements from two electricity transformers in separate Chinese counties, each with 7 sensor features. The Exchange Rate dataset contains the daily exchange rates between 8 different currencies against USD from 1990 to 2016, with XRP/USD as the target variable for forecasting. The Weather dataset contains 4 years of daily weather measurements from 21 monitoring stations across Canada, with the target variable being the temperature readings from a specific station. The National Illness dataset contains weekly influenza-like illness ratios reported by the US Centers for Disease Control (CDC), containing data from 2002 to 2021 across multiple US regions. These datasets were selected as they are commonly used to benchmark time-series forecasting models and we re-use the data preprocessing and splitting procedures, as found in recent state-of-the-art deterministic forecasting model literature (Zhou et al., 2021; Wu et al., 2021; Zhou et al., 2022; Wu et al., 2023; Nie et al., 2023). Additional statistics for the datasets are provided in Appendix 6.
Dataset Splits	Yes	These datasets were selected as they are commonly used to benchmark time-series forecasting models and we re-use the data preprocessing and splitting procedures, as found in recent state-of-the-art deterministic forecasting model literature (Zhou et al., 2021; Wu et al., 2021; Zhou et al., 2022; Wu et al., 2023; Nie et al., 2023). Additional statistics for the datasets are provided in Appendix 6. Table 6: Statistics for Time Series datasets used in experiments Name Features Dataset Size Frequency Domain ETTh1 7 (8545, 2881, 2881) Hourly Electricity ETTh2 7 (8545, 2881, 2881) Hourly Electricity Exchange 8 (5120, 665, 1422) Daily Exchange Rate Weather 21 (36792, 5271, 10540) Daily Weather National Illness 7 (617, 74, 170) Weekly Illness For all experiments we use the same training, validation, and test data splits as the original papers. Sequence length m was set to 96 for all models, with prediction horizon h set to 48.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions using 'deep learning methods' and discusses inference speed but lacks concrete hardware specifications.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'Transformer-based models' but does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, etc.). It references a GitHub repository for adapted model implementations but does not list the specific software dependencies with versions used for the experiments in this paper.
Experiment Setup	Yes	For the deterministic forecasting models Informer, Autoformer, FEDformer, Patch TST, and Times Net AΨ we re-use known hyperparameters from their respective experiments (Zhou et al., 2021; Wu et al., 2021; Zhou et al., 2022; Wu et al., 2023; Nie et al., 2023). For the MLP-based encoder and decoder θy, θxy parameterizations, we use 4 layers for each with 128 hidden units in each of the fully connected layers. For TEM selective forecasting using Aggregated energy inference, as described in Section 3.4, we select the covariance coefficient σ2 for the multivariate normal distribution N(0, σ2I) from which we will draw noise samples δi. We select one σ2 {0.0, 0.02, 0.05, 0.1, 0.2, 0.3, 0.5} for each trained TEM model. For each prediction ˆY , we draw 32 samples δi to generate aggregated energy Eθ(X, ˆY ). For TEM selective forecasting using Energy optimization inference, we perform gradient descent using the Adam optimizer using step sizes η {0.1, 0.01, 0.001} and step counts T {5, 10, 25}. In this section, we provide additional implementation details. The base implementation of TEM models presented in this paper is available at https://github.com/Jonas Brusokas/Time-Energy-Model. The deterministic forecasting models (Informer, Autoformer, FEDformer, Patch TST, and Times Net) were trained according to the hyperparameters specified in their respective papers (Zhou et al., 2021; Wu et al., 2021; Zhou et al., 2022; Wu et al., 2023; Nie et al., 2023). For all experiments we use the same training, validation, and test data splits as the original papers. Sequence length m was set to 96 for all models, with prediction horizon h set to 48. For deterministic forecasting models AΨ, we recreate the model architectures and training procedures as described in the original papers, using the same hyperparameters. All models were trained with 2-layer encoders, 1-layer decoders. All deterministic models were trained using the Mean Squared Error loss function, using Adam optimizer with learning rate 0.0001 and dropout rate of 0.05. Deterministic models were trained for up to 30 epochs, using early stopping with patience parameter of 3. Transformer-based models (Informer, Autoformer, FEDformer, and Patch TST) were trained with 8 attention heads and 512 dimensionality embedding, attention, and feed-forward layers.