reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AT4TS : Autotune for Time Series Foundation Models

Authors: Shivani Tomar, Seshu Tirupathi, Radu Marinescu, Elizabeth M. Daly, Ivana Dusparic

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on real-world benchmark datasets demonstrate that AT4TS efficiently identifies the optimal configuration of tunable hyperparameters for autotuning TSFMs. We show improvements as high as 20.55% and 45.34% for one of the out-of-domain datasets compared to zero-shot pre-trained models for Chronos and TTM respectively.
Researcher Affiliation	Collaboration	Shivani Tomar EMAIL Trinity College Dublin IBM Research, Ireland Seshu Tirupathi EMAIL IBM Research, Ireland Radu Marinescu EMAIL IBM Research, Ireland Elizabeth M. Daly EMAIL IBM Research, Ireland Ivana Dusparic EMAIL Trinity College Dublin
Pseudocode	Yes	Algorithm 1 outlines the steps involved in AT4TS.
Open Source Code	No	AT4TS has been implemented using Ray Tune (Liaw et al., 2018) and Transformers3 libraries. It supports both Transformer and non-Transformer TSFMs as well as parameter efficient fine-tuning (backed by the PEFT4 Library). Footnote 3: https://huggingface.co/docs/transformers Footnote 4: https://huggingface.co/docs/peft
Open Datasets	Yes	For our experiments, we use 10 univariate datasets from the Monash Time Series Forecasting Repository (Godahewa et al., 2021) for autotuning Chronos models. To further validate the efficacy of our approach in an unseen target domain, we use a real-world wind energy dataset called MORE2 data containing exogenous channels. Footnote 2: https://github.com/MORE-EU/Open Data
Dataset Splits	Yes	For each dataset, the train split includes all the data points from the beginning of the time series up until the last two prediction horizon lengths, which are held out for validation and testing respectively. We use the Gluon TS library to split the datasets using the same strategy as used in (Ansari et al., 2024).
Hardware Specification	Yes	The experiments were performed on a multi-node cluster environment using a combination of CPUs and A100 GPUs.
Software Dependencies	No	AT4TS has been implemented using Ray Tune (Liaw et al., 2018) and Transformers3 libraries. It supports both Transformer and non-Transformer TSFMs as well as parameter efficient fine-tuning (backed by the PEFT4 Library). While these libraries are mentioned, specific version numbers are not provided for Ray Tune, Transformers, or PEFT.
Experiment Setup	Yes	Tables 2 and 4 show the search space for Lo RA hyperparameters and custom fine-tuning hyperparameters used in autotuning Chronos and TTM models respectively. We limit the number of trials to 10 in case of Chronos and 15 for TTM... In our case, we set this to 8 for Chronos and 2 for TTM. To ensure a comprehensive evaluation across different fine-tuning settings in the case of Chronos, we use mean absolute scaled error (MASE) as the evaluation metric... For TTM, we use mean squared error (MSE) as the standard error metric and report average MSE scores across five forecast lengths, FLs {24, 48, 60, 96, 192} similar to (Ekambaram et al., 2024).