reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Neural Scaling Laws for Time Series Foundation Models

Authors: Qingren Yao, Chao-Han Huck Yang, Renhe Jiang, Yuxuan Liang, Ming Jin, Shirui Pan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments reveal that the negative log-likelihood of TSFMs exhibits similar scaling behavior in both OOD and ID settings. We further compare the scaling properties across different architectures, incorporating two state-of-the-art TSFMs as case studies, showing that model architecture plays a significant role in scaling.
Researcher Affiliation	Collaboration	Qingren Yao1,2, Chao-Han Huck Yang3, Renhe Jiang4, Yuxuan Liang2 , Ming Jin1 , Shirui Pan1 1Griffith University 2The Hong Kong University of Science and Technology (Guangzhou) 3NVIDIA Research 4The University of Tokyo
Pseudocode	No	The paper describes methods and processes in narrative text and mathematical formulations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code and related source of this work are available at https://github.com/Qingrenn/TSFM-Scaling Laws for reproducibility.
Open Datasets	Yes	To this end, we constructed our time series corpus for TSFM pre-training from the large-scale open time series archive, LOTSA (Woo et al., 2024). The corpus comprises approximately 17B time points from 39 datasets spanning seven distinct domains. ... A detailed breakdown of the data sources is provided in Appendix A, with a summary in Table 1.
Dataset Splits	Yes	For each subset, 95% of the data was allocated for model training, with the remaining 5% reserved as a validation set to evaluate in-distribution forecasting performance. Additionally, we used a subset from a widely recognized long-sequence prediction benchmark (Wu et al., 2023) to test the model s out-of-distribution forecasting capabilities. To further enhance the reliability, we also incorporated a subset of the Monash dataset (Godahewa et al., 2021) as additional OOD test data.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions using the "Adam W optimizer" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation.
Experiment Setup	Yes	Our training objective is to optimize the mixture distribution log-likelihood. We utilize the Adam W optimizer with a batch size of 128, and a maximum learning rate of 10 3 with a linear warm-up of 104 training steps, followed by cosine decay for the remaining 9 104 steps. ... In our baseline models, the patch size P is set to 32. ... We sample 15% 50% lengths as forecast horizon and the remaining as context horizon, for a given time series.