reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DynaConF: Dynamic Forecasting of Non-Stationary Time Series

Authors: Siqi Liu, Andreas Lehrmann

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our approach with 2 univariate and 9 multivariate time series models on synthetic (Section 4.1) and real-world (Section 4.2 and 4.3) datasets; see Table 1 for an overview, including references to the relevant literature and implementations. For evaluation we use a rolling-window approach with a window size of 10 steps. The final evaluation metrics are the aggregated results from all 100 test windows. We report the mean squared error (MSE) and continuous ranked probability score (CRPS) (Matheson & Winkler, 1976), a commonly used score to measure how close the predicted distribution is to the true distribution (see Appendix E for details).
Researcher Affiliation	Industry	Siqi Liu EMAIL Borealis AI Andreas Lehrmann EMAIL Borealis AI
Pseudocode	No	The paper describes the methodology using mathematical equations and descriptive text, and provides an architecture diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The core of our model, called Dyna Con F1, is a clean decoupling of the time-variant (non-stationary) and the time-invariant (stationary) part of the distribution. 1https://github.com/Borealis AI/dcf
Open Datasets	Yes	We evaluate the proposed method on 6 widely-used datasets7 with published results (Lai et al., 2018; Salinas et al., 2019): (Exchange) daily exchange rates of 8 different countries from 1990 to 2016; (Solar)8 hourly solar power production in 137 PV plants in 2006; (Electricity)9 hourly electricity consumption of 370 customers from 2012 to 2014; (Traffic)10 hourly occupancy data at 963 sensor locations in the San Francisco Bay area; (Taxi) rides taken in 30-minute intervals at 1214 locations in New York City in January 2015/2016; (Wikipedia) daily page views of 2000 Wikipedia articles. ... 8http://www.nrel.gov/grid/solar-power-data.html 9https://archive.ics.uci.edu/ml/datasets/Electricity_Load_Diagrams2011_2014 10http://pems.dot.ca.gov. We further evaluate our method against state-of-the-art baselines on two more publicly available datasets: (Walmart)11 weekly sales of 45 Walmart stores from February 2010 to October 2012; (Temperature)12 monthly average temperatures of 1000 cities from January 1980 to September 2020. ... 11https://www.kaggle.com/datasets/yasserh/walmart-dataset 12https://www.kaggle.com/datasets/hansukyang/temperature-history-of-1000-cities-1980-to-2020
Dataset Splits	Yes	For our experiments on synthetic data we simulate four conditionally non-stationary stochastic processes for T = 2500 time steps, where we use the first 1000 steps as training data, the next 500 steps as validation data, and the remaining 1000 steps as test data. ... We use the last 10% of the training time period as the validation set and choose the initial learning rate, number of training epochs, and model sizes using the performance on the validation set for Stati Con F. ... For Walmart, the forecast window size is 4 weeks, and the test set consists of the last 20 weeks. For Temperature, the forecast window size is 3 months, and the test set consists of the last 24 months.
Hardware Specification	Yes	OOM: The model ran out of memory on our 16GB GPU using the minimum batch size and lookback window size.
Software Dependencies	No	The paper mentions software components like "Adam" (optimizer), "Gluon TS", "PyTorch TS", and "TSlib" (libraries/frameworks), but does not provide specific version numbers for any of these components as used in the authors' own implementation or experiments.
Experiment Setup	Yes	For our models, we use Adam (Kingma & Ba, 2014) as the optimizer with the default initial learning rate of 0.001 unless it is chosen using the validation set. The dimension of the latent vector zt,i (see Section 3.2) is set to E = 4 across all the experiments. ... We perform 50 updates per epoch. We use 32 hidden units for our 2-layer MLP encoder. ... For our method, we first train Stati Con F and then reuse its learned encoder in Dyna Con F, so the optimization of Dyna Con F is focused on the dynamic model. For Dyna Con F, we use the same validation set to choose the number of training epochs and use 0.01 as the initial learning rate. ... Our models use a two-layer LSTM with 128 hidden units as the encoder, except for the 8-dimensional Exchange data, where the hidden size is 8.