Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

Authors: Maksim Zhdanov, Max Welling, Jan-Willem Van De Meent

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate Erwin s effectiveness across multiple domains, including cosmology, molecular dynamics, PDE solving, and particle fluid dynamics, where it consistently outperforms baseline methods both in accuracy and computational efficiency.
Researcher Affiliation Collaboration 1AMLab, University of Amsterdam 2Cusp AI. Correspondence to: Maksim Zhdanov <EMAIL>.
Pseudocode Yes To highlight the simplicity of our method, we provide the pseudocode3: # coarsening ball tree x = rearrange([x, rel.pos], "(n 2l) d n (2l d)") @ Wc pos = reduce(pos, "(n 2l) d n d", "mean")
Open Source Code Yes The code is available at https://github.com/ maxxxzdn/erwin.
Open Datasets Yes To demonstrate our model s ability to capture long-range interactions, we use the cosmology benchmark (Balla et al., 2024), which consists of large-scale point clouds representing potential galaxy distributions. The dataset consists of single-chain coarse-grained polymers (Webb et al., 2020; Fu et al., 2022) simulated using MD. We benchmark on multiple datasets taken from Li et al. (2023a). Additionally, we evaluate our model on airflow pressure modeling (Umetani & Bickel, 2018; Alkin et al., 2024a). We use EAGLE (Janny et al., 2023), a large-scale benchmark of unsteady fluid dynamics.
Dataset Splits Yes Dataset splits followed the original benchmarks: Cosmology: Training set varied from 64 to 8192 examples, with validation and test sets of 512 examples each Molecular Dynamics: 100 short trajectories for training, 40 long trajectories for testing PDE Benchmarks: 1000 training / 200 test examples (except Plasticity: 900/80) Shape Net-Car: 700 training / 189 test examples EAGLE: 1184 trajectories with 80%/10%/10% split
Hardware Specification Yes All experiments were conducted on a single NVIDIA RTX A6000 GPU with 48GB memory and 16 AMD EPYC 7543 CPUs.
Software Dependencies Yes Erwin and all baselines except those for cosmology were implemented in Py Torch 2.6.
Experiment Setup Yes All models were trained using the Adam W optimizer (Loshchilov & Hutter, 2019) with weight decay 10 5. The learning rate was tuned in the range 10 4 to 10 3 to minimize loss on the respective validation sets with cosine decay to 10 7. Gradient clipping by norm with value 1.0 was applied across all experiments. Early stopping was used only for Shape Net-Car and molecular dynamics tasks, while all other models were trained until convergence. In every experiment, we normalize inputs to the model. Hyperparameter optimization was performed using grid search with single trials.