Curvature-aware Graph Attention for PDEs on Manifolds
Authors: Yunfeng Liao, Jiawen Guan, Xiucheng Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our code is available at https://github.com/Supradax/CurvGT. 5. Experiments Experiment Settings. A dataset is a collection of tuples (u(t) 1 , ..., u(t) m ; u(t+1)) and the associated neural operator is F : L2(M) ... L2(M) L2(M). Following the paradigm in (Li et al., 2020a), we assume the input functions I following a certain distribution µ and define the loss by: L := EI µℓ(F(I), u(t+1)) (20) where ℓis a function that measures the difference between the output F(I) and the ground truth u(t+1). ℓis usually chosen to be Lp norm when solving PDEs in Euclidean spaces. Here we adopt L2 norm || ||2 and Hilbert H1 norm || ||H1(M) since they can be naturally extended to a general manifold M: ||u(x)||2 2 := Z x M u2(x)dx, (21) ||u(x)||2 H1(M) := ||u(x)||2 2 + || u(x)||2 2. (22) To calculate the above integrals, we first compute the mass matrix M with finite element methods. M is a diagonal matrix whose diagonal element mi is the surface area that vertex vi takes up. Therefore, we have u(x) 2 2 u Mu. Likewise, one can adopt a discrete gradient operator G induced by a Witney basis to take the place of (Jacobson, 2013), which is indeed a matrix: || u(x)||2 2 i=1 mi f(Gu), f(Gu) g|vi (23) where f maps the vector field on faces to that on vertices and the inner product in each tangent space at vi are defined as those in constant curvature spaces. Datasets. The previous methods (Li et al., 2023a) select a function u(x, t) on a geometry object and compute its source terms f(x, t) by the corresponding equations to obtain a data tuple (u(t), f (t); u(t+1)). Such an approach is out of the following considerations: (i) Popular datasets (Li et al., 2023a) are limited to Euclidean spaces, which does not match the task we focus on; (ii) Convergence and accuracy of numerical methods on a general manifold cannot be guaranteed. Hence, we generate the dataset by selecting a collection of functions on a certain parametric surface in advance, and closed-form computations on parametric manifolds make the dataset more reliable. In this paper, various time-dependent PDEs on different manifolds are studied. More details are available in Appendix B. Baselines and Implementation Details. To showcase the efficacy of our proposed Curvature-aware Attention, we equip it with the Graph Transformer and denote the resulting method as Curv-GT. We evaluate Curv-GT against the following neural PDE solvers including, GCN (Xu et al., 2023), GAT (Veliˇckovi c et al., 2018), Deep ONet (Lu et al., 2021; Jin et al., 2022), Graph U-Net (Gao & Ji, 2019), Graph Transformer (Yun et al., 2019), MKGN (Li et al., 2020c), Galerkin-type Attention (Cao, 2021), GNOT (Hao et al., 2023), GINO (Li et al., 2023b), Geo-FNO (Li et al., 2023a) and Transolver (Wu et al., 2024). The implementation details are provided in Appendix C. Main Results. We first study the performance of different methods for PDEs on the wrinkle manifold a complex manifold containing both positive, constant, and negative curvatures (shown in Figure 10). To make it more intuitive, the results are shown in the form of a relative loss L/LBase where LBase denotes the loss obtained by simply taking the observation u(t) as the prediction at time t + 1. As shown in Table 1, our proposed Curv-GT consistently achieves the best results over three wrinkle manifold benchmarks. The non-graph-based models like Res Net and Deep ONet struggle with the p-Laplacian Diffusion equation as they predict with node features and global 3D coordinates, failing to capture features from neighbors and in addition, solely relying on 3D coordinates limits solving the PDEs in R3 instead of M, which lose the geometry completely. Traditional graph-based models like GT with various different types of attention are not able to perceive the geometric structure since these attentions are all based on node features. Spectral methods like GCN and GINO can yield large errors as they neglect the local spatial structures. Geo FNO and Transolver give rise to relative small errors among the baseline methods, indicating the usefulness of geometry information. However, the performance gaps between Geo-FNO, Transolver and Curv-GT also show that directly learning the mapping from parameter space to the manifold or merely leveraging the extrinsic geometry is not sufficient for solving PDEs on manifolds. The experiments on more manifolds are available in Appendix D. Ablation on Curvature Geometry. The performances of different methods with and without our proposed curvature-aware attention are reported in Table 2, including GAT, GT, and GNOT. It shows that our proposed curvature-aware attention can outperform their non-curvature-aware counterparts by large margins. Besides, Table 2 also presents the performances of GAT by directly concatenating (Direct Concatenation) the point curvatures to node features and aggregating information with a shared linear mapping (Linear Mapping). It can be observed that the results are much worse than our proposed curvature-aware attention. Ablation on Multi-Head Attention. In this experiment, we aim to verify the efficacy of the proposed muli-head curvature-aware attention. Figure 8 presents the performance change against the number of heads C and the maximum depth of subtrees d. It demonstrates that 1) the increase in the number of heads can enhance model accuracy and stability, and it limits the model learning capability if its value is too small; 2) the performance drops if d is too large because it cannot capture the local features well with a large depth. In conclusion, a proper depth should be associated with the specific structure of a discrete manifold. Training Time Comparison. In this experiment, we run different neural PDE solvers on elliptic paraboloids in different resolutions. Figure 9 presents the average one-epoch training times of different methods varying against the graph size (|V |). It can be observed that our proposed Curv-GAT is still faster than GINO. In particular, the one-epoch training time over a graph with a size of 2500 is around 15 seconds, which can satisfy the practical requirements. Stability of Subtree Partitioning. The subtree partitioning may be very imbalanced due to random selections. Our 100 trials on a 1,024-node torus show stable results, as shown in Table 3. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China. Correspondence to: Xiucheng Li <EMAIL>. |
| Pseudocode | No | The paper describes the proposed methods and mathematical formulations using prose and equations (e.g., Section 4.2 Curvature-aware Graph Attention), but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Supradax/CurvGT. |
| Open Datasets | No | Datasets. The previous methods (Li et al., 2023a) select a function u(x, t) on a geometry object and compute its source terms f(x, t) by the corresponding equations to obtain a data tuple (u(t), f (t); u(t+1)). Such an approach is out of the following considerations: (i) Popular datasets (Li et al., 2023a) are limited to Euclidean spaces, which does not match the task we focus on; (ii) Convergence and accuracy of numerical methods on a general manifold cannot be guaranteed. Hence, we generate the dataset by selecting a collection of functions on a certain parametric surface in advance, and closed-form computations on parametric manifolds make the dataset more reliable. In this paper, various time-dependent PDEs on different manifolds are studied. More details are available in Appendix B. The paper states they generate their own datasets but provides no access information for them. |
| Dataset Splits | Yes | Datasets are partitioned into train-set and test-set by ratio 0.8 randomly. A validation set is then created with a ratio of 0.1 from the train-set. |
| Hardware Specification | Yes | The experiments were conducted on Ubuntu 20.04 LST equipped with 4 NVIDIA RTX A6000 GPUs, each with 48 GB of GPU memory. |
| Software Dependencies | Yes | Our method is implemented with Pytorch 1.13 and Python 3.12. |
| Experiment Setup | Yes | The initial learning rate γ of each model is selected from the set {i 10 j : i {1, 5}, j {1, 2, 3}} to optimize its performance. Besides, the Adam optimizer is used with a decay rate β1 = 0.9. Datasets are partitioned into train-set and test-set by ratio 0.8 randomly. A validation set is then created with a ratio of 0.1 from the train-set. The batch size is fixed at either 10 or 20 up to the graph scale. Specifically, for experiments on the sphere, torus and wrinkle manifold, the batch size is set to 20 while 10 is adopted for the others. All models are trained for 300 epochs. The one behaving best on the validation set is selected to participate in the comparison. The curvature threshold ε is set to 10 3, in order to both avoid numerical instability and capture the curvature information of most vertices. All curvature-aware models use a 3-head curvature-aware attention mechanism, resulting in the attention head amount D = 3. Guided by Figure 8, a proper subtree partition maximum depth d is set to 4. For instance, in that kind of partition, the wrinkle is then decomposed into 27 subtrees. |