Emergence of meta-stable clustering in mean-field transformer models
Authors: Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5 we present some numerical simulations and in Section 6 the conclusions. ... In the center, numerical simulations depict particle trajectories for β = 5 (top) and β = 7 (bottom), with 10^4 particles whose initial conditions are sampled uniformly at random. ... The computation is performed using PyTorch in double precision on a Nvidia Tesla T4 GPU. ... Figure 3 shows the average results from 20 simulations per particle number. |
| Researcher Affiliation | Academia | Giuseppe Bruno1 Federico Pasqualotto2 Andrea Agazzi1 1Department of Mathematics and Statistics, University of Bern 2Department of Mathematics, University of California, San Diego Contacts: EMAIL, EMAIL |
| Pseudocode | No | The paper describes mathematical equations and theoretical concepts, but does not include any distinct pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Code is available at this Git Hub-Repository. |
| Open Datasets | No | In the center, numerical simulations depict particle trajectories for β = 5 (top) and β = 7 (bottom), with 10^4 particles whose initial conditions are sampled uniformly at random. ... Given that the results in Theorem 4.2 apply to general perturbations of uniform measures, we consider as initial condition, µ0, the uniform measure slightly perturbed by white noise. |
| Dataset Splits | No | The numerical experiments involve simulations with initial conditions sampled uniformly at random or perturbed by white noise, rather than using pre-existing datasets with explicit training/test/validation splits. |
| Hardware Specification | Yes | The computation is performed using PyTorch in double precision on a Nvidia Tesla T4 GPU. |
| Software Dependencies | No | The computation is performed using PyTorch in double precision on a Nvidia Tesla T4 GPU. No specific version numbers for PyTorch or other software are mentioned. |
| Experiment Setup | Yes | The angular ODEs system (10), equivalent to Eq. (USA), is numerically solved using the Euler method with a time step dt = 5 * 10^-4. ... The continuity equation (3), more precisely its angular counterpart Eq.(11), is numerically solved using the Lax Friedrichs method by discretizing the spatial domain into 10^4 grid points over the interval [0, 2π], with a spatial step dx = 2π / 10^4 and a time step dt = 0.05dx. ... We fix β = 2 and run multiple simulations with the same setup of the first experiment, with the number of particles ranging from N = 1000 to N = 16000. The simulations are terminated when the approximate total variation distance between the uniform distribution and the token distribution (computed using 100 bins for the histogram) exceeds a fixed threshold. |