Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems

Authors: Jie Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on over 800 matrices suggests that the construction time of these graph neural preconditioners (GNPs) is more predictable and can be much shorter than that of other widely used ones, such as ILU and AMG, while the execution time is faster than using a Krylov method as the preconditioner, such as in inner-outer GMRES.
Researcher Affiliation Collaboration Jie Chen MIT-IBM Watson AI Lab, IBM Research EMAIL
Pseudocode Yes Algorithm 1 FGMRES with M being a nonlinear operator
Open Source Code Yes The implementation of GNP is available at https://github.com/jiechenjiechen/GNP.
Open Datasets Yes To this end, we turn to the Suite Sparse matrix collection https://sparse.tamu.edu, which is a widely used benchmark in numerical linear algebra.
Dataset Splits No The paper describes how training data (b,x) pairs are generated for each matrix using sampling methods ("we sample x from both N(0, Σx m) and N(0, In) to form each training batch"), and how evaluation matrices are selected ("We select all square, real-valued, and non-SPD matrices whose number of rows falls between 1K and 100K and whose number of nonzeros is fewer than 2M. This selection results in 867 matrices from 50 application areas."). However, it does not specify how this collection of 867 matrices is split into training, validation, or test sets for the overall approach, as each GNN is trained individually for each matrix.
Hardware Specification Yes Our experiments are conducted on a machine with one Tesla V100(16GB) GPU, 96 Intel Xeon 2.40GHz cores, and 386GB main memory.
Software Dependencies No All code is implemented in Python with Pytorch. The paper mentions software packages like scipy.sparse.linalg.spilu, Super LU, Py AMG, Amg X, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes We use L = 8 Res-GCONV layers, set the layer input/output dimension to 16, and use 2-layer MLPs with hidden dimension 32 for lifting/projection. We use Adam (Kingma & Ba, 2015) as the optimizer, set the learning rate to 1e-3, and train for 2000 steps with a batch size of 16. We apply neither dropouts nor weight decays. ... We use the ℓ1 residual norm AM(b) Ax 1 as the training loss... We use m = 40 Arnoldi steps when sampling the (x, b) pairs according to (5). Among the 16 pairs in a batch, 8 pairs follow (5) and 8 pairs follow x N(0, In).