reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Regularization of Learnable Embeddings for Time Series Forecasting

Authors: Luca Butera, Giovanni De Felice, Andrea Cini, Cesare Alippi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Specifically, we perform the first extensive empirical study on the subject and show how such regularizations consistently improve performance in widely adopted architectures." and "5 Experiments We evaluate the effectiveness of different regularization strategies for local embeddings under three different scenarios: time series forecasting benchmarks (Sec. 5.1), transfer learning (Sec. 5.3), and a sensitivity analysis through embedding perturbations (Sec. 5.4).
Researcher Affiliation	Academia	Luca Butera EMAIL Università della Svizzera Italiana, IDSIA", "Giovanni De Felice EMAIL University of Liverpool", "Andrea Cini EMAIL Università della Svizzera Italiana, IDSIA", "Cesare Alippi EMAIL Università della Svizzera Italiana, IDSIA Politecnico di Milano
Pseudocode	No	The paper describes the methods and procedures in narrative text and mathematical equations, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Reproducibility Python code to reproduce the experiments is available online1. 1https://github.com/Luca Butera/TS-embedding-regularization
Open Datasets	Yes	We consider six real-world datasets of time series collections, spanning four different application domains: METR-LA and PEMS-BAY (Li et al., 2018) as two established benchmarks for traffic forecasting, AQI (Zheng et al., 2015) from the air quality domain, CER-E (CER, 2016) from the energy consumption domain, CLM-D (De Felice et al., 2024) and Eng RAD (Marisca et al., 2024) as two multivariate climatic datasets. Details on the datasets, data splits and forecasting settings can be found in Appendix A." and Appendix A provides specific links and citations for all datasets, e.g., "AQI Air quality data from Zheng et al. (2015). The dataset collects measurements of the PM2.5 pollutant from air quality stations in 43 Chinese cities and is available at https://www.microsoft.com/en-us/research/publication/forecasting-fine-grained-air-quality-based-on-big-data/
Dataset Splits	Yes	All datasets were split 70%/10%/20% into train, validation and test along the temporal axis.
Hardware Specification	Yes	Computing resources Experiments were run on A100 and A5000 NVIDIA GPUs. The vast majority of the experiments conducted in our work can be easily run on moderate GPU hardware, with at least 8 GBs of VRAM.
Software Dependencies	No	We used the Python (Van Rossum & Drake, 2009) programming language, leveraging Torch Spatiotemporal (Cini & Marisca, 2022), Pytorch (Paszke et al., 2019) and Pytorch Lightning (Falcon & The Py Torch Lightning team, 2019) to implement all the experiments. The paper mentions software tools but does not provide specific version numbers for libraries like PyTorch, PyTorch Lightning, or Torch Spatiotemporal, which are crucial for reproducibility.
Experiment Setup	Yes	For the experiments in Tab. 1, optimal hyperparameters, i.e., learning rate lr and hidden size dh, for each model and dataset, were found via a grid-search over lr [0.00025, 0.00075, 0.0015, 0.003] and dh [32, 64, 128, 256]... This resulted in: weight of L2 λl2 = 0.0001, weight of L1 λl1 = 0.00001, weight of variational regularization λvar = 0.00005 and weight of clustering regularization λclst = 0.0005. Dropout s probability of dropping a connection was set to p = 0.5... we set the resetting period of forgetting to k = 20 epochs with 30 epochs of warm-up... The training lasted up to 150 epochs with 50 epochs of early stopping patience, while fine-tuning lasted up to 1000 epochs with 100 epochs of patience.