Mesh-Informed Neural Operator : A Transformer Generative Approach

Authors: Yaozhong Shi, Zachary E Ross, Domniki Asimaki, Kamyar Azizzadenesheli

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experiments show that MINO achieves state-of-the-art (SOTA) performance on a diverse suite of benchmarks with regular and irregular grids. We demonstrate through analysis and experiments that Sliced Wasserstein Distance (SWD) and Maximum Mean Discrepancy (MMD) are efficient, robust, dataset-independent metrics for evaluating the performance of functional generative models on regular and irregular grids. In this section we empirically evaluate MINO and other baselines on a suite of functional generative benchmarks under the OFM (Shi et al., 2025) paradigm due to its concise formulation and SOTA performance among functional generative paradigms.
Researcher Affiliation Collaboration Yaozhong Shi EMAIL California Institute of Technology Zachary E. Ross EMAIL California Institute of Technology Domniki Asimaki EMAIL California Institute of Technology Kamyar Azizzadenesheli EMAIL NVIDIA Corporation
Pseudocode No The paper includes equations for the model architecture (Eqs. 2-8) and diagrams (Figure 1), but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Python code is available at https://github.com/yzshi5/ MINO
Open Datasets Yes Navier-Stokes. This dataset contains solutions to the 2D Navier-Stokes equations on a torus at a resolution of 64 64. Following the pre-processing of previous work (Kerrigan et al., 2023b; Shi et al., 2025), we use 30,000 samples for training and 5,000 for testing, drawn from the original dataset introduced in (Li et al., 2021). Shallow Water. This dataset contains solutions to the shallow-water equations for a 2D radial dam-break scenario on a square domain, from PDEBench (Takamoto et al., 2022). Darcy Flow. This dataset contains steady-state solutions of 2D Darcy Flow over the unit square, obtained directly from the PDEBench benchmark (Takamoto et al., 2022). Cylinder Flow. We use the Cylinder Flow dataset of Han et al. (2022), which describes flow past a cylinder on a fixed mesh of 1,699 nodes. Mesh GP. This is a synthetic dataset generated on a fixed irregular mesh of 3,727 nodes provided by (Zhao et al., 2022). Global Climate. We use the real-world global climate dataset from (Dupont et al., 2022), which contains global temperature measurements over the last 40 years.
Dataset Splits Yes Navier-Stokes. This dataset contains solutions to the 2D Navier-Stokes equations on a torus at a resolution of 64 64. Following the pre-processing of previous work (Kerrigan et al., 2023b; Shi et al., 2025), we use 30,000 samples for training and 5,000 for testing, drawn from the original dataset introduced in (Li et al., 2021). Shallow Water. Each of the 1,000 simulations has 1,000 time steps at 128 128 resolution; we downsample spatially to 64 64 for efficiency and treat each time step as an independent snapshot. We randomly select 30,000 snapshots for training and 5,000 for testing. Darcy Flow. The dataset contains 10,000 samples and we split it into 8,000 samples for training and 2,000 for testing. Cylinder Flow. From 101 simulations 400 time steps, we ignore temporal order and treat each time step as an independent sample, randomly selecting 30,000 training and 5,000 testing samples. Mesh GP. We generate function samples from a Gaussian Process (GP) with a Matérn kernel (length scale = 0.4, smoothness factor = 1.5) given the domain, creating a training set of 30,000 samples and a test set of 5,000 samples. Global Climate. The dataset contains 9,676 training samples and 2,420 test samples.
Hardware Specification Yes To accommodate it on a single NVIDIA RTX A6000 Ada GPU (48 GB memory), we reduced its batch size to 48 and, to maintain a comparable training iteration (duration), limited its training to 200 epochs.
Software Dependencies No We train all models for 300 epochs using the Adam W optimizer with an initial learning rate of 1e-4. We employ a step learning rate scheduler that decays the learning rate by a gamma of 0.8 every 25 epochs. The default batch size is 96. To evaluate the models, we generate the same number of samples as contained in the test set for each dataset shown in Table 2. All samples are generated by solving the learned ODE numerically using the dopri5 solver from the torchdiffeq library (Chen et al., 2018), with an error tolerance set to 1e-5 for all experiments. In our experiments, we use the official implementation from the POT library (Flamary et al., 2021). The first uses the chordal distance (the Euclidean distance in the R3 embedding space), while the second employs the geodesic distance on S2, for which we leverage the Geometric Kernels library (Mostowsky et al., 2024).
Experiment Setup Yes We train all models for 300 epochs using the Adam W optimizer with an initial learning rate of 1e-4. We employ a step learning rate scheduler that decays the learning rate by a gamma of 0.8 every 25 epochs. The default batch size is 96. However, Transolver consumes significantly more GPU memory than other models. To accommodate it on a single NVIDIA RTX A6000 Ada GPU (48 GB memory), we reduced its batch size to 48 and, to maintain a comparable training iteration (duration), limited its training to 200 epochs. To evaluate the models, we generate the same number of samples as contained in the test set for each dataset shown in Table 2. All samples are generated by solving the learned ODE numerically using the dopri5 solver from the torchdiffeq library (Chen et al., 2018), with an error tolerance set to 1e-5 for all experiments. The GNO maps input functions to a latent representation on a 16 16 grid of query points, defined over the [0, 1]2 domain. We set the GNO search radius to 0.07, the latent dimension Ldim to 256, and the number of attention heads to 4. The encoder consists of M1 = 5 cross-attention blocks. For its latent-space processor, we adopt a Diffusion U-Net architecture from the torchcfm library (Tong et al., 2024), which operates on the [16, 16] latent tensor with 64 channels, 1 residual block, and 4 attention heads for the processor.