Schwarz–Schur Involution: Lightspeed Differentiable Sparse Linear Solvers

Authors: Yu Wang, Mazdak Abulnaga, Yaël Balbastre, Bruce Fischl

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our method with direct and iterative solvers and demonstrate significant advantages in runtime. The speedup from our approach can potentially be much more significant than what is reported here. Compared to standard solvers that are already highly optimized, our algorithm is na ıvely prototyped in Py Torch re-implemented as low-level operators or optimization at the hardware level like CUDA operators will result in further acceleration. Additionally, as a direct solver, our back-substitution variant enjoys extra speedups in repetitive solves by reusing numerical factorization impossible for iterative solvers: Table 7.
Researcher Affiliation Collaboration Yu Wang 1 2 3 S. Mazdak Abulnaga 1 2 Ya el Balbastre 1 4 Bruce Fischl 1 2 1Athinoula Martinos Center, Massachusetts General Hospital, Harvard Medical School 2Massachusetts Institute of Technology, MIT CSAIL, USA 3Sony, USA 4University College London, UK. Correspondence to: <EMAIL>.
Pseudocode Yes Algorithm 1 Schwarz Schur Involution. (Overall solve: numerical factorization + back substitution.) Algorithm 2 The parallel Schwarz step. Forward pass. Algorithm 3 The parallel Schwarz step. Backward pass. Algorithm 4 The parallel Schur step, j-th step, j = (2i + 1). Forward pass. Horizontal. Algorithm 5 The parallel j-th Schur step, j = (2i + 1). Backward pass. Horizontal.
Open Source Code Yes Our code is available at https://github.com/ wangyu9/Schwarz_Schur_Involution.
Open Datasets Yes We test on 1024 Darcy flow examples available with the Python package neuraloperator from Li et al. (2024) resampled from 4212 to 2572.
Dataset Splits No The paper mentions 'resampled from 4212 to 2572' for Darcy flow examples. This describes a data transformation or selection but does not specify how the data was partitioned into training, validation, or test sets.
Hardware Specification Yes Our experiments are collected on an Nvidia GPU A6000 Ada 48GB and an Intel CPU Xeon w9-3475X.
Software Dependencies No The paper mentions several software packages like Py Torch, Sci Py, CUDA, Py AMG, Cu Py, Julia, Krylov.jl, Nvidia AMGX, Super LU, UMFPACK, Eigen, Cholmod/Suite Sparse, and METIS. However, it does not provide specific version numbers for these tools as used in the authors' own implementation, which is necessary for reproducibility.
Experiment Setup Yes Our method supports both single and double floating-point numbers (float-32 and float-64), while existing direct solvers only support float-64, or support float-32 without major speedups (cu DSS). Implemented in float-64, our method often achieves a relative error of 10-16 10-12 even better than Sci Py (Table 9). Our method can use float-32 for further speedup, in which case the error range from 10-6 10-5, a typical error tolerance that iterative solvers use, sufficient for many tasks. A hyperparameter in our method is the size of patches, which we choose as 4x4 (enlarged to 5x5 for duplicating boundary pixels at interfaces), except for the image of size 2561^2 where we choose the size to be 5x5 (enlarged to 6x6). We test all methods on solving the image matting Laplacian A = L + ̸M (the L in B.3, which is highly anisotropic, ̸ = 10-2, 10-4).