Zero-shot forecasting of chaotic systems
Authors: Yuanzhao Zhang, William Gilpin
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across 135 distinct chaotic dynamical systems and 108 timepoints, we find that foundation models produce competitive forecasts compared to custom-trained models (including NBEATS, Ti DE, etc.), particularly when training data is limited. Our main contributions are: 1. A large-scale evaluation of the ability of time series foundation models to model physical systems outside of their training domain. 2. Discovery that foundation models produce zero-shot forecasts competitive with models custom-trained to forecast chaotic attractors. Moreover, larger foundation models produce better forecasts. |
| Researcher Affiliation | Academia | Yuanzhao Zhang Santa Fe Institute Santa Fe, NM, USA; William Gilpin Department of Physics University of Texas at Austin Austin, TX, USA; Correspondence to EMAIL |
| Pseudocode | No | The paper describes methods and algorithms in narrative text and figures (e.g., Figure 1 for the benchmark pipeline), but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | All zero-shot benchmark forecast results and scripts are available online at https://github.com/williamgilpin/dysts_data. The dynamical systems benchmark dataset is available online at https://github.com/williamgilpin/dysts |
| Open Datasets | Yes | The dysts dataset represents a standardized benchmark of 135 low-dimensional chaotic systems, described by ordinary differential equations that have been aligned with respect to their dominant timescales and integration steps (Gilpin, 2021; 2023). The dynamical systems benchmark dataset is available online at https://github.com/williamgilpin/dysts |
| Dataset Splits | Yes | For each of the 135 chaotic dynamical systems, 20 trajectories of length 812 are generated... All time series are then split into training sets consisting of the first 512 points of each time series, with the last 300 timepoints set aside to determine final test scores. For experiments with varying context lengths, trajectories are extended backwards in time, so that the 300 test points remain the same. For a given dynamical system, each of the 20 training trajectories is divided into a true training set comprising the first 435 timepoints, and a validation set of the last 77 timepoints. |
| Hardware Specification | Yes | The experiments require 104 walltime compute hours on an Nvidia A100 GPU. We measure the walltime of training and inference on a single A100 GPU node. |
| Software Dependencies | No | The paper mentions several models (NBEATS, Ti DE, NVAR, Transformer, LSTM) and the Darts forecasting library (Herzen et al., 2022). While hyperparameters are provided, specific version numbers for software dependencies like Python, PyTorch, or the Darts library itself are not given. |
| Experiment Setup | Yes | For the baseline models, hyperparameter tuning is performed separately for each of the 135 dynamical systems. For a given dynamical system, each of the 20 training trajectories is divided into a true training set comprising the first 435 timepoints, and a validation set of the last 77 timepoints. For each set of hyperparameters, a model is trained on the true training set and then evaluated on the validation set. ... Appendix F.1 BASELINE MODEL HYPERPARAMETERS lists specific hyperparameters for N-BEATS (e.g., Input Length, Number of Stacks: 30, Number of Blocks: 1, Number of Layers: 4, Dropout Fraction: 0.0), Transformer (e.g., Number Attention Heads: 4, Number Encoder Layers: 3), Ti DE, NVAR, and LSTM models. |