reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods

Authors: Andres Fernandez, Frank Schneider, Maren Mahsereci, Philipp Hennig

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments reveal an overlap between magnitude parameter masks and top Hessian eigenspaces consistently higher than chance-level, and that this effect gets accentuated for larger network sizes. This result indicates that top Hessian eigenvectors tend to be concentrated around larger parameters, or equivalently, that larger parameters tend to align with directions of larger loss curvature. Our work provides a methodology to approximate and analyze deep learning Hessians at scale, as well as a novel insight on the structure of their eigenspace.
Researcher Affiliation	Collaboration	Andres Fernandez EMAIL Tübingen AI Center University of Tübingen Frank Schneider EMAIL Tübingen AI Center University of Tübingen Maren Mahsereci EMAIL Yahoo Research Philipp Hennig EMAIL Tübingen AI Center University of Tübingen
Pseudocode	Yes	Algorithm 1: SSVD (from Tropp et al. (2019)) Algorithm 2: SEIGH
Open Source Code	Yes	To efficiently compute overlap, we develop SEIGH (Section 5 and Alg. 2), a matrix-free eigendecomposition based on sketched SVDs (Tropp et al., 2019). Our open source implementation2 allows to compute top-k Hessian eigendecompositions for ką103 on neural networks with over 10M parameters, an unprecedented scale by orders of magnitude. 2https://github.com/andres-fr/hessian_overlap
Open Datasets	Yes	MLP 16x16 MNIST Res Net18 Image Net CIFAR-10 3c3d-CNN Schneider et al. (2019) CIFAR-100 All-CNN-C Springenberg et al. (2015)
Dataset Splits	Yes	Table 1: Overview of experimental settings, detailing number of model parameters (D), learning rate (η), batch size (B), steps per epoch (T), test accuracy (acc) at step t, number of train/test samples used to compute Htrain/Htest (Ntrain{Ntest respectively), and number of SEIGH outer measurements (n O, see Alg. 2) Problem Model D η B T acc Ntrain{Ntest n O 16ˆ16 MNIST tanh-MLP Martens & Grosse (2015) 7030 0.3 500 100 95.78% (t 1000) 500/500 355 CIFAR-10 3c3d-CNN Schneider et al. (2019) 895,210 0.0226 128 312 74.52% (t 8000) 500/500 1000 CIFAR-100 All-CNN-C Springenberg et al. (2015) 1,387,108 0.1658 256 156 40.50% (t 8000) n.a./1000 1000 Image Net Res Net-18 He et al. (2016) 11,689,512 0.1 150 8207 17.33% (t 8000) n.a./5000 1500
Hardware Specification	Yes	Figure 18: Runtimes of main SEIGH operations to compute a single Hessian eigendecomposition, assuming a single computer with 400GB RAM equipped with an NVIDIA A100 (40GB) graphics card.
Software Dependencies	No	We used Py Torch Paszke et al. (2019). Curv Lin Ops
Experiment Setup	Yes	Table 1: Overview of experimental settings, detailing number of model parameters (D), learning rate (η), batch size (B), steps per epoch (T), test accuracy (acc) at step t, number of train/test samples used to compute Htrain/Htest (Ntrain{Ntest respectively), and number of SEIGH outer measurements (n O, see Alg. 2) Problem Model D η B T acc Ntrain{Ntest n O 16ˆ16 MNIST tanh-MLP Martens & Grosse (2015) 7030 0.3 500 100 95.78% (t 1000) 500/500 355 CIFAR-10 3c3d-CNN Schneider et al. (2019) 895,210 0.0226 128 312 74.52% (t 8000) 500/500 1000 CIFAR-100 All-CNN-C Springenberg et al. (2015) 1,387,108 0.1658 256 156 40.50% (t 8000) n.a./1000 1000 Image Net Res Net-18 He et al. (2016) 11,689,512 0.1 150 8207 17.33% (t 8000) n.a./5000 1500