MeshMask: Physics-Based Simulations with Masked Graph Neural Networks

Authors: Paul Garnier, Vincent Lannelongue, Jonathan Viquerat, Elie Hachem

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method achieves state-of-the-art results on seven CFD datasets, including a new challenging dataset of 3D intracranial aneurysm simulations with over 250,000 nodes per mesh. Moreover, it significantly improves model performance and training efficiency across such diverse range of fluid simulation tasks. We demonstrate improvements of up to 60% in long-term prediction accuracy compared to previous best models, while maintaining similar computational costs. Notably, our approach enables effective pre-training on multiple datasets simultaneously, significantly reducing the time and data required to achieve high performance on new tasks. Through extensive ablation studies, we provide insights into the optimal masking ratio, architectural choices, and training strategies.
Researcher Affiliation Academia Paul Garnier Mines Paris PSL University Centre for Material Forming (CEMEF) CNRS EMAIL Lannelongue Mines Paris PSL University Centre for Material Forming (CEMEF) CNRS EMAIL Viquerat Mines Paris PSL University Centre for Material Forming (CEMEF) CNRS EMAIL Hachem Mines Paris PSL University Centre for Material Forming (CEMEF) CNRS EMAIL
Pseudocode No The paper describes the methodology, architecture, and training process using descriptive text and diagrams (Figures 1, 2, 3, 5) without formal pseudocode or algorithm blocks.
Open Source Code No The code used in this paper will be released after publication.
Open Datasets Yes Datasets from the COMSOL solver are originally from Pfaff et al. (2021). The MULTIPLE BEZIER dataset is from Garnier et al. (2024). The 3D ANEURYSM dataset is from Goetz et al. (2024b). Given the considerable leap in complexity (both in terms of mesh size, inputs and physics), a more detailed presentation is given below in 3.1.1.
Dataset Splits Yes Each training set contains 100 trajectories, and testing set 20 trajectories.
Hardware Specification No Table 3: Every model is trained for a total of 1M steps. v RAM and Inference Time are computed on the CYLINDER dataset. The large increase of parameters in our method is mostly due to the gated MLP and the expansion factor e = 3. MODEL # TRAINING STEPS # PARAMETERS VRAM (IN GB) INFERENCE TIME (MS/STEP) MGN 1M 2.8M 7 49.3 Multigrid 1M 3.5M 11 55.2 Multi Scale GNN 1M 2.6M 8 53.7 BSMS-GNN 1M 2.1M 7 52.8 MGN w/ masking 500k + 500k 2.8M 7 49.3 Ours w/ Gated MLP 500k + 500k 9.2M 16 59.1
Software Dependencies No The paper mentions several software/solvers (COMSOL, Arc Sim, SU2, Cimlib) used for generating datasets, but does not specify their version numbers or any other software dependencies with version numbers for the model implementation itself.
Experiment Setup Yes Network Architecture All of the MLPs (except the Gated MLPs) are made of 2 hidden layers of size 128 with Re LU activation functions. Outputs are normalized with a Layer Norm. The Gated MLPs are using a hidden dimension p of size 128 and an expansion factor e = 3. In the case of Multi Grid model, Down Scale blocks use a ratio of 0.5. We follow the state-of-the-art and all our models are W-cycle with 15 message passing steps. Training We trained our models using an L2 loss, with a batch size of 21. During pretraining, the loss is only computed on masked nodes, similar to Devlin et al. (2019); He et al. (2021). We start by pre-training our Encoder and Decoder for 500k training steps, using an exponential learning rate decay from 10-4 to 10-6 over the last 250k steps. We then finetune the Encoder for another 500k training steps, with the same strategy for the learning rate. During the pre-training, if not specified, we use a node masking ratio of 40% (roughly equivalent to masking 60 to 70% of the mesh information depending on its geometry). All models are trained using an Adam optimizer Kingma & Ba (2017). The baseline models are trained for a million training steps to evenly compare to our models. Finally, following the same strategy as Sanchez-Gonzalez et al. (2020); Pfaff et al. (2021); Garnier et al. (2024), we introduce noise to our inputs. More specifically, we add random noise N(0, σ) to the dynamical variables. Noises are presentend in Table A.1.2.