reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-shot forecasting of chaotic systems

Authors: Yuanzhao Zhang, William Gilpin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across 135 distinct chaotic dynamical systems and 108 timepoints, we find that foundation models produce competitive forecasts compared to custom-trained models (including NBEATS, Ti DE, etc.), particularly when training data is limited. Our main contributions are: 1. A large-scale evaluation of the ability of time series foundation models to model physical systems outside of their training domain. 2. Discovery that foundation models produce zero-shot forecasts competitive with models custom-trained to forecast chaotic attractors. Moreover, larger foundation models produce better forecasts.
Researcher Affiliation	Academia	Yuanzhao Zhang Santa Fe Institute Santa Fe, NM, USA; William Gilpin Department of Physics University of Texas at Austin Austin, TX, USA; Correspondence to EMAIL
Pseudocode	No	The paper describes methods and algorithms in narrative text and figures (e.g., Figure 1 for the benchmark pipeline), but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All zero-shot benchmark forecast results and scripts are available online at https://github.com/williamgilpin/dysts_data. The dynamical systems benchmark dataset is available online at https://github.com/williamgilpin/dysts
Open Datasets	Yes	The dysts dataset represents a standardized benchmark of 135 low-dimensional chaotic systems, described by ordinary differential equations that have been aligned with respect to their dominant timescales and integration steps (Gilpin, 2021; 2023). The dynamical systems benchmark dataset is available online at https://github.com/williamgilpin/dysts
Dataset Splits	Yes	For each of the 135 chaotic dynamical systems, 20 trajectories of length 812 are generated... All time series are then split into training sets consisting of the first 512 points of each time series, with the last 300 timepoints set aside to determine final test scores. For experiments with varying context lengths, trajectories are extended backwards in time, so that the 300 test points remain the same. For a given dynamical system, each of the 20 training trajectories is divided into a true training set comprising the first 435 timepoints, and a validation set of the last 77 timepoints.
Hardware Specification	Yes	The experiments require 104 walltime compute hours on an Nvidia A100 GPU. We measure the walltime of training and inference on a single A100 GPU node.
Software Dependencies	No	The paper mentions several models (NBEATS, Ti DE, NVAR, Transformer, LSTM) and the Darts forecasting library (Herzen et al., 2022). While hyperparameters are provided, specific version numbers for software dependencies like Python, PyTorch, or the Darts library itself are not given.
Experiment Setup	Yes	For the baseline models, hyperparameter tuning is performed separately for each of the 135 dynamical systems. For a given dynamical system, each of the 20 training trajectories is divided into a true training set comprising the first 435 timepoints, and a validation set of the last 77 timepoints. For each set of hyperparameters, a model is trained on the true training set and then evaluated on the validation set. ... Appendix F.1 BASELINE MODEL HYPERPARAMETERS lists specific hyperparameters for N-BEATS (e.g., Input Length, Number of Stacks: 30, Number of Blocks: 1, Number of Layers: 4, Dropout Fraction: 0.0), Transformer (e.g., Number Attention Heads: 4, Number Encoder Layers: 3), Ti DE, NVAR, and LSTM models.