reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DeepRRTime: Robust Time-series Forecasting with a Regularized INR Basis

Authors: Chandramouli Shama Sastry, Mahdi Gilany, Kry Yik-Chau Lui, Martin Magill, Alexander Pashevich

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the effectiveness of our regularization method, Deep RRTime, on real-world datasets and compare it with relevant baselines. In addition to the regular timeseries forecasting problem, we also test its robustness in three more challenging settings to illuminate advantages of the proposed regularization technique: forecasting with missing values in the lookback window, training with reduced dataset size, and forecasting on a finer time grid at test time than the one used during training. For evaluation, we use 6 real-world benchmarks for LTSF which include Electricity Transformer Temperature (ETTm2), Electricity Consumption Load (ECL), Exchange, Traffic, Weather, and Influenza-like Illness (ILI), detailed in Appendix B. For each dataset, we evaluate our model on tasks with four distinct forecast horizons, as is standard practice in the related literature. We also use the standard train, validation, and test sets splits: 60/20/20 for ETTm2 and 70/10/20 for the remaining 5 datasets (Woo et al., 2023). We preprocess each dataset by standardization based on train set statistics. To compare our method with other approaches, we employ two metrics: mean squared error (MSE) and mean absolute error (MAE).
Researcher Affiliation	Collaboration	Chandramouli Sastry EMAIL Dalhousie University Vector Institute RBC Borealis Mahdi Gilany EMAIL Queen s University RBC Borealis Kry Yik-Chau Lui EMAIL RBC Borealis Martin Magill EMAIL RBC Borealis Alexander Pashevich EMAIL RBC Borealis
Pseudocode	No	The paper describes the Deep Time and Deep RRTime methods using mathematical formulations, equations (e.g., Equation 1, 2, 3, 4, 7, 9) and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	Our model code is available at https://github.com/Borealis AI/Deep RRTime.
Open Datasets	Yes	For evaluation, we use 6 real-world benchmarks for LTSF which include Electricity Transformer Temperature (ETTm2), Electricity Consumption Load (ECL), Exchange, Traffic, Weather, and Influenza-like Illness (ILI), detailed in Appendix B. For each dataset, we evaluate our model on tasks with four distinct forecast horizons, as is standard practice in the related literature. Appendix B: ETTm2 (Zhou et al., 2021) Electricity Transformer Temperature dataset provides measurements from an electricity transformer such as load and oil temperature at a 15 minutes frequency. https://github.com/zhouhaoyi/ETDataset ECL The Electricity Consumption Load dataset comprises electricity usage data for 321 households, gathered between 2012 and 2014. Originally recorded every 15 minutes, the data is compiled into hourly aggregates. https://archive.ics.uci.edu/ml/datasets/Electricity Load Diagrams20112014 Exchange (Lai et al., 2018) Dataset provides exchange rates of USD with currencies of eight countries (Australia, United Kingdom, Canada, Switzerland, China, Japan, New Zealand, and Singapore) from 1990 to 2016 at a daily frequency. https://github.com/laiguokun/multivariate-time-series-data Traffic Dataset from the California Department of Transportation provides road occupancy rates from 862 sensors located on the freeways of the San Francisco Bay area at a hourly frequency. https://pems.dot.ca.gov/ Weather Dataset provides measurements of 21 meteorological indicators such as air temperature and humidity throughout 2020 at a 10 minute frequency from the Weather Station of the Max Planck Biogeochemistry Institute. https://www.bgc-jena.mpg.de/wetter/ ILI Influenza-like Illness dataset provides ratio of patients seen with ILI and the total number of patients, collected by the Centers for Disease Control and Prevention of the United States between 2002 and 2021 at a weekly frequency. https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
Dataset Splits	Yes	We also use the standard train, validation, and test sets splits: 60/20/20 for ETTm2 and 70/10/20 for the remaining 5 datasets (Woo et al., 2023).
Hardware Specification	Yes	We measure the wall-clock time per epoch and peak memory usage in both train and inference modes when using Patch TST and Deep RRTime on a single NVIDIA Tesla V GPU (16GB).
Software Dependencies	No	The paper mentions using Adam optimizer (Kingma & Ba, 2014), ReLU activation, Dropout (Srivastava et al., 2014), and Layer Norm (Ba et al., 2016). It also states that the implementation is based on the Deep Time paper's code. However, it does not provide specific version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow) or tools used in the experimental setup.
Experiment Setup	Yes	For each dataset, we evaluate our model on tasks with four distinct forecast horizons, as is standard practice in the related literature. We also use the standard train, validation, and test sets splits: 60/20/20 for ETTm2 and 70/10/20 for the remaining 5 datasets (Woo et al., 2023). We preprocess each dataset by standardization based on train set statistics... Hyperparameter selection, when applied, is performed for the lookback length multiplier µ which defines the length of the lookback window L relative to the prediction horizon H as L = µH. We search through the values µ {1, 3, 5, 7, 9}, and select the best value based on the validation loss. Throughout this work, we use identical values for all common hyperparameters of Deep Time and Deep RRTime besides the lookback multiplier µ. We recompute the results of Deep Time using the original open source code implementation, and report errors over 10 random network initializations. We emphasize that our implementation is largely based on the original code of Deep Time and the proposed regularization requires minimal code changes. Please refer to Appendix C for a complete list of hyper parameters and further implementation details. Table 7: Hyperparameters used in our experiments. Parameters inherited from Deep Time: Epochs 50, Learning rate 1e-3, λ1 learning rate 1.0, Warm up epochs 5, Batch size 256, Early stopping patience 7, Max gradient norm 10.0, Layer size 256, λ1 initialization 0.0, Scales [0.01, 0.1, 1, 5, 10, 20, 50, 100], Fourier features size 4096, INR dropout rate 0.1, Lookback length multiplier, µ µ {1, 3, 5, 7, 9}. Our parameters: λ2 1.0.