Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction

Authors: Yi He, Yiming Yang, Xiaoyuan Cheng, Hai Wang, Xiao Xue, Boli Chen, Yukun Hu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared to operator-based and transformer-based methods, our model achieves better performances across five metrics, from short-term prediction accuracy to long-term statistics. In addition to our methodological contributions, we introduce new chaotic system benchmarks: a machine learning dataset of 140k snapshots of turbulent channel flow and a processed high-dimensional Kolmogorov Flow dataset, along with various evaluation metrics for both shortand long-term performances.
Researcher Affiliation Academia 1Dynamic Systems, University College London, United Kingdom 2Statistical Science, University College London, United Kingdom 3Center for Computational Science, University College London, United Kingdom 4Electronic and Electrical Engineering, University College London, United Kingdom. Correspondence to: Yukun Hu <EMAIL>.
Pseudocode Yes Algorithm 1 Lunitary with Hutchinson s Stochastic Trace Estimator Require: Operator ˆG Rd d, batch size B 1: Initialize Lunitary = 0 2: for b = 1 to B do 3: Sample v(b) Unif(Sd 1) 4: q(b) = v(b)T ˆGT ˆGv(b) 5: Lunitary Lunitary + |q(b) 1|/B 6: end for 7: return Lunitary
Open Source Code No The paper does not provide an explicit statement or link to the source code for the methodology it describes. While it references official implementations for baselines (e.g., UNO from https://github.com/neuraloperator/neuraloperator/tree/main, MNO from https://github.com/neuraloperator/markov_neural_operator/tree/main, MWT from https://github.com/gaurav71531/mwt-operator, Factformer from https://github.com/Barati Lab/Fact Former/tree/main), it does not do so for its own proposed model.
Open Datasets Yes We introduce datasets of two high-dimensional chaotic systems: 1) turbulent channel flow with 140k snapshots in the 3D simulation; 2) the Kolmogorov Flow simulation of 185k 2D vorticity states. Both datasets with details in Appendix D and G have been carefully processed to ensure consistency and usability, making them well-suited for early-stage machine learning research on ergodic chaotic systems. He, Y. Benchmark dataset kf256 for chaos meets attention: Transformers for large-scale dynamical prediction, may 2025. URL https://doi.org/10.5281/zenodo.14801580. He, Y., Xue, X., Hu, Y., Yang, Y., Cheng, X., and Wang, H. Benchmark dataset Turbulent Channel Flow for Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction. 5 2025. doi: 10.5522/04/29118212.v2. URL https://rdr.ucl.ac.uk/articles/dataset/Benchmark_ dataset_Turbulent_Channel_Flow_for_ Chaos_Meets_Attention_Transformers_ for_Large-Scale_Dynamical_Prediction/ 29118212.
Dataset Splits Yes The datasets of vorticity states of the Kolmogorov flow system consist of 150 training trajectories, 40 validation trajectories, and 30 testing trajectories in total. Each trajectory contains 500 frames for 10 seconds with a unique initial state. The datasets comprise 240 training trajectories, 24 validation trajectories, and 24 test trajectories, with each trajectory containing 595 frames.
Hardware Specification Yes Computational resources for these machine learning studies used in each chaos system experiment are listed in Table 6. Table 6. Computational resources by experiment. Experiment Hardware KF128 2 Ge Force RTX4090 GPUs, 24 GB KF256 1 A100 GPU, 40GB TCF 1 A100 GPU, 40GB
Software Dependencies No No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries) are provided in the paper. While some software tools like 'jax-cfd toolbox' (D.1) and 'SciPy library' (E.2) are mentioned, along with the 'Adam optimizer' (D.4), their versions are not specified.
Experiment Setup Yes For the benchmarks, the major hyperparameter configurations of our transformers are listed in Table 5. ... We use Adam as the optimizer to train the models. The learning rate initiates from 1e 4 and decays by half for every 10 of 50 epochs, following the scales and settings of related chaos works (Li et al., 2022a; 2024). All models are trained for 50 epochs using their default batch size settings.