Transformer for Partial Differential Equations’ Operator Learning
Authors: Zijie Li, Kazem Meidani, Amir Barati Farimani
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first present the benchmark results on the standard neural operator benchmark problems from Li et al. (2021a) and compare our model with several other state-of-the-art operator learning frameworks, including Galerkin/Fourier Transformer (G.T. / F.T.) from Cao (2021) and Multiwavelet-based Operator (MWT) (Gupta et al., 2021). Then we showcase the model’s application to irregular grids where above frameworks cannot be directly applied to, and compare model to a graph neural network baseline (Lötzsch et al., 2022). We also perform an analysis on the model’s latent encoding. Finally, we present an ablation study of key architectural choices. (The full details of model architecture on different problems and training procedure are provided in the Appendix A. More ablation studies of hyperparameter choices are provided in Appendix C.) |
| Researcher Affiliation | Academia | Zijie Li EMAIL Department of Mechanical Engineering Carnegie Mellon University Kazem Meidani EMAIL Department of Mechanical Engineering Carnegie Mellon University Amir Barati Farimani EMAIL Department of Mechanical Engineering Carnegie Mellon University |
| Pseudocode | No | The paper describes the model architecture and mechanisms using mathematical equations and figures (e.g., Figure 1 and Figure 2), but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured, code-like steps for any specific procedure in text. |
| Open Source Code | Yes | Code is available at: https://github.com/BaratiLab/OFormer. |
| Open Datasets | Yes | Following datasets are used as benchmark in Li et al. (2021a); Cao (2021); Gupta et al. (2021), where the data is generated using equi-spaced simulation grid, and then downsampled to create dataset with different resolution. Dataset and numerical solver courtesy: https://github.com/zongyi-li/fourier_neural_operator The data is generated by solving the above equation using finite element method implemented with FEniCS library (Alnæs et al., 2015) 8. The element used is a quadratic triangle element. We use the pre-generated data from Lötzsch et al. (2022) for experiment. Dataset and numerical solver courtesy: https://github.com/merantix-momentum/gnn-bvp-solver We use the pre-generated dataset9 from Pfaff et al. (2020) to carry out the experiment. Dataset courtesy: https://github.com/deepmind/deepmind-research/tree/master/meshgraphnets |
| Dataset Splits | Yes | For the size of each Navier-Stokes dataset, NS2-full contains 9800/200 (train/test) samples; NS-mix is a larger dataset consisting of 10000/1000 samples; each of the rest datasets contains 1000/200 samples. For Burgers equation and Darcy flow, the dataset splitting is same as in Cao (2021). The training set contains grids of four shapes (square, circle with or without hole, L-shape) with 8000 samples, while testing set has 2000 samples consisting of U-shape grids that model has never seen during training. The dataset contains 1000/100 sequences of different inflow speed (Mach number) and angles of attack. |
| Hardware Specification | Yes | The benchmark was conducted on a RTX-3090 GPU using PyTorch 1.8.1 (1.7 for MWT) and CUDA 11.0. |
| Software Dependencies | Yes | All the models are implemented in Py Torch (Paszke et al., 2019). The benchmark was conducted on a RTX-3090 GPU using PyTorch 1.8.1 (1.7 for MWT) and CUDA 11.0. |
| Experiment Setup | Yes | We use GELU (Hendrycks & Gimpel, 2016) as activation function for propagator and decoder, and opt for Gated-GELU (Dauphin et al., 2016) for FFN. On 1D Burgers, we train the model for 20k iterations using a batch size of 16. On Darcy flow, we train the model for 32K iterations using a batch size of 8. For small dataset on Navier-Stokes (consisting of 1000 training samples), we train the model for 32k iterations using a batch size of 16, and 128k iterations for large dataset (consisting of roughly 10k training samples). On Electrostatics/Magnetostatics we train the model for 32k iterations using a batch size of 16. On Airfoil we train model for 50k iterations using a batch size of 10. |