Global curvature for second-order optimization of neural networks
Authors: Alberto Bernacchia
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods. |
| Researcher Affiliation | Industry | 1Media Tek Research, Cambridge, UK. Correspondence to: Alberto Bernacchia <EMAIL>. |
| Pseudocode | Yes | A detailed description of the complete procedure is provided in Algorithm 1 in the Appendix, using the simple case of a two-layer MLP with Tanh activation and no bias. |
| Open Source Code | Yes | Code: github.com/mtkresearch/symo notebooks |
| Open Datasets | No | The synthetic dataset consists of 5000 training and 5000 testing data points, where the input is sampled from a Gaussian distribution with zero mean. The covariance matrix of the input is generated using random orthogonal eigenvectors (Mezzadri, 2007), and the eigenvalues are set on a logarithmic grid between 10 5 and 100. |
| Dataset Splits | Yes | The synthetic dataset consists of 5000 training and 5000 testing data points, where the input is sampled from a Gaussian distribution with zero mean. |
| Hardware Specification | No | These are matrix-matrix products of size equal to the neural network width, that can be computed efficiently using a GPU. |
| Software Dependencies | No | In Pytorch for example, Assumption 2.1 holds for nn.init.normal and nn.init.orthogonal... |
| Experiment Setup | Yes | For all optimizers, learning rate is set by a grid search. For second-order optimizers, we additionally set a second hyperparameter by grid search: damping λ for KFAC, initialization ϵ for Shampoo and decay parameter β for Sym O. |