Global curvature for second-order optimization of neural networks

Authors: Alberto Bernacchia

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods.
Researcher Affiliation Industry 1Media Tek Research, Cambridge, UK. Correspondence to: Alberto Bernacchia <EMAIL>.
Pseudocode Yes A detailed description of the complete procedure is provided in Algorithm 1 in the Appendix, using the simple case of a two-layer MLP with Tanh activation and no bias.
Open Source Code Yes Code: github.com/mtkresearch/symo notebooks
Open Datasets No The synthetic dataset consists of 5000 training and 5000 testing data points, where the input is sampled from a Gaussian distribution with zero mean. The covariance matrix of the input is generated using random orthogonal eigenvectors (Mezzadri, 2007), and the eigenvalues are set on a logarithmic grid between 10 5 and 100.
Dataset Splits Yes The synthetic dataset consists of 5000 training and 5000 testing data points, where the input is sampled from a Gaussian distribution with zero mean.
Hardware Specification No These are matrix-matrix products of size equal to the neural network width, that can be computed efficiently using a GPU.
Software Dependencies No In Pytorch for example, Assumption 2.1 holds for nn.init.normal and nn.init.orthogonal...
Experiment Setup Yes For all optimizers, learning rate is set by a grid search. For second-order optimizers, we additionally set a second hyperparameter by grid search: damping λ for KFAC, initialization ϵ for Shampoo and decay parameter β for Sym O.