Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation

Authors: Adrian Hill, Guillaume Dalle

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On real-world problems from scientific ML, graph neural networks and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, using our sparsity detection system, ASD outperforms standard AD for one-off computations, without amortization of either sparsity detection or matrix coloring. ... We also provide in-depth benchmarks of SCT as part of a full Automatic Sparse Differentiation (ASD) system in Julia for the computation of Jacobians and Hessians. ... Section 5 showcases numerical experiments for Jacobian and Hessian sparsity detection, as well as resulting ASD benchmarks.
Researcher Affiliation Academia Adrian Hill EMAIL BIFOLD Berlin Institute for the Foundations of Learning and Data, Berlin, Germany Machine Learning Group, Technical University of Berlin, Berlin, Germany Guillaume Dalle EMAIL LVMT, ENPC, Institut Polytechnique de Paris, Univ Gustave Eiffel, Marne-la-Vallée, France
Pseudocode No The paper describes algorithms for sparsity detection and propagation rules using mathematical formulas and classifications (e.g., Table 1 and Table 3), but it does not present these as structured pseudocode or algorithm blocks with numbered steps.
Open Source Code Yes To ensure reproducibility, we provide the complete source code and matching Julia environments in our public repository https://github.com/adrhill/sparser-better-faster-stronger/ (Hill, 2025).
Open Datasets Yes Our test case is a simplified versions of the chains dataset (Gu et al., 2020, Appendix E.1). ... Power Grid Lib (PGLib) has been developed by Babaeinejadsarookolaee et al. (2021), which can be run via the rosetta-opf (Coffrin and Dowson, 2022) implementation of the ACOPF.
Dataset Splits No The paper mentions using a 'chains dataset' and describes how the training set was built ('using one positive and one negative chain'), but it does not provide explicit training, validation, or test split percentages, sample counts, or specific predefined split information for any dataset used in the experiments.
Hardware Specification Yes All experiments and benchmarks were run using Julia 1.11 on an Apple M3 Pro CPU with 36 GB of RAM.
Software Dependencies Yes All experiments and benchmarks were run using Julia 1.11 on an Apple M3 Pro CPU with 36 GB of RAM. ... We implement the tracer number types described in the previous section in an open-source software package called Sparse Connectivity Tracer.jl (SCT). ... The first is Sparse Matrix Colorings.jl (Montoison et al., 2025). The second is Differentiation Interface.jl (Dalle and Hill, 2025a).
Experiment Setup No The paper describes the architecture of an implicit GNN layer (e.g., '16 hidden neurons and ReLU activation, followed by dropout and a linear prediction layer with cross-entropy loss') and mentions using 'default parameters' for an iterative solver. However, it does not provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or detailed optimizer settings for the experiments conducted.