reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Universal Approximation of Mean-Field Models via Transformers

Authors: Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the L2 distance between the expected transformer and the infinite-dimensional mean-field vector field can be uniformly bounded by a function of the number of particles observed during training.
Researcher Affiliation	Academia	1Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA 2Boston College, Boston, MA, USA. Correspondence to: Shiba Biswal <EMAIL>, Rishi Sonthalia <EMAIL>.
Pseudocode	No	The paper includes definitions for Multi-Headed Self-Attention and Transformer Network in Appendix D, but these are descriptive definitions rather than structured pseudocode or algorithm blocks for the overall methodology presented in the main paper.
Open Source Code	Yes	Code can be found at: https://github.com/rsonthal/Mean-Field-Transformers
Open Datasets	Yes	Our first goal focuses on learning the vector field F. Towards this, in this experiment, we use two datasets: first, a synthetic dataset generated from the Cucker-Smale model (Cucker & Smale, 2007), and second, real data of fish milling (Katz et al., 2021). ... Katz, Y., Kolbjørn, T., Christos C, I., Huepe, C., and Iain D, C. Fish schooling data subset: Oregon state university. https://ir.library.oregonstate. edu/concern/datasets/zk51vq07c, 2021.
Dataset Splits	Yes	The data so obtained are split into an 80-20-20 split for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using 'SciPy s solve ivp' and 'Adam' optimizer with 'cosine annealing' but does not specify version numbers for these or other software components.
Experiment Setup	Yes	Hyperparameters We consider depths in {3, 4, 5}, widths in {128, 256, 512}, and learning rates in {0.0002, 0.0001, 0.001}. ... We train the models using mini-batch Adam and a cosine annealing learning rate schedule. For the synthetics CS data, we used a batch size of 500 and trained the model for 1000 epochs. For the fish milling data, we used a batch size of 1, and trained the model for 10 epochs. ... We fix the transformer to have a hidden dimension of 512 and 5 layers. We train the model for 250 epochs, using a learning rate 0.0002, batch size of 1000, using Adam with cosine annealing.