reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recency-Weighted Temporally-Segmented Ensemble for Time Series Modeling

Authors: Pål V. B. Johnsen, Eivind Bøhn, Sølve Eidnes, Filippo Remonato, Signe Riemer-Sørensen

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a comparative analysis, using two years of data from a wastewater treatment plant and a drinking water treatment plant in Norway, demonstrating the Re WTS ensemble’s superiority. It consistently outperforms the global model in terms of mean squared forecasting error across various model architectures by 10-70% on both datasets, notably exhibiting greater resilience to outliers. We further explore the generalizability of Re WTS by applying it to four publicly available datasets.
Researcher Affiliation	Industry	PÅL V. B. JOHNSEN , EIVIND BØHN, SØLVE EIDNES, FILIPPO REMONATO, and SIGNE RIEMER-SØRENSEN, SINTEF Digital, Norway. Authors Contact Information: Pål V. B. Johnsen, orcid: 0000-0002-2599-7914, EMAIL; Eivind Bøhn, orcid: 0000-0002-0676-3688; Sølve Eidnes, orcid: 0000-0002-1002-3543, EMAIL; Filippo Remonato, orcid: 0000-0002-1988-6230; Signe Riemer-Sørensen, orcid: 0000-0002-5308-7651, EMAIL, SINTEF Digital, Oslo, Norway.
Pseudocode	Yes	Algorithm 1 The ℎstep forecasting procedure Re WTS method
Open Source Code	Yes	The source code for this work is openly available at https://github.com/SINTEF/rewts.
Open Datasets	Yes	The first dataset is the Air Quality (AQ) dataset (Vito 2008), which contains sensor measurements collected from March 2004 to February 2005 in a polluted area of an Italian city which gives 9357 data points in total. ... The second dataset is the Beijing Multi-Site Air Quality (BMAQ) dataset (S. Chen 2017), which contains hourly recordings of air pollutants... The third dataset investigated is the ELEC2 dataset, which consists of electricity prices collected from the Australian New South Wales Electricity Market between May 1996 and December 1998 (Harries 1999).
Dataset Splits	Yes	We denote the first eight chunks to be in the train dataset, and the rest in the test dataset. ... For each chunk in the train dataset, we train one forecasting model until convergence on a hold-out validation set accounting for the last 25% of the chunk. ... To decide on model-specific hyperparameters we reserve the first third (33 %) of the data (around half a year) for HPO, using the first 90% to train models and the last 10% to evaluate the hyperparameter choices. ... Each chunk is divided into a 75% training set and a 25% validation set...
Hardware Specification	No	No specific hardware details (like GPU/CPU models or memory specifications) are provided for running the experiments. The paper focuses on the algorithmic aspects and performance on various datasets, without specifying the underlying hardware used for computations.
Software Dependencies	Yes	In particular, we apply the quadratic programming solver provided by the CVXOPT Python package (Andersen et al. 2013). ... The implementation of the Re WTS ensemble model is integrated within the Python package Darts (Herzen et al. 2022). ... Our software package leverages the Hydra framework (Yadan 2019)... It further features a pipeline using the Optuna package for hyperparameter optimization (Akiba et al. 2019).
Experiment Setup	Yes	For both the Re WTS ensemble model and the global model, a recurrent neural network (RNN) of type Long-short-term memory (LSTM) (Staudemeyer and Morris 2019) is applied with an input length of 80 time steps (ensuring at least one full period of the preceding sine wave), and equal hyperparameter values such as optimizer (Adam), batch size, early stopping and max epochs. The difference lies in the dimension space of the hidden LSTM layer. ... each chunk model is set to have a single LSTM layer of dimension 10, yielding 500 learnable parameters.