Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects
Authors: Ke Liang Xiao, Noah Marshall, Atish Agarwala, Elliot Paquette
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to ADAM. ... Numerical simulations suggest that in practice this correspondence is strong even by d = 500 (Figure 1 (a), (b)). |
| Researcher Affiliation | Collaboration | 1Department of Mathematics and Statistics, Mc Gill University, Montreal, Canada 2Google Deepmind, Mountain View, United States of America. |
| Pseudocode | No | The paper describes mathematical equations, definitions, and theorems, including update rules and SDEs/ODEs, but it does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce these results is available at https://anonymous.4open.science/r/sign SGD-6216/. |
| Open Datasets | Yes | CIFAR10 (Krizhevsky, 2009) data was used to perform binary classification... The IMDB dataset (Maas et al., 2011) was first embedded using GLOVE (Pennington et al., 2014) into dimension 50. |
| Dataset Splits | No | The paper mentions removing the 'frog class' from CIFAR10 to retain 'balanced classes' and discusses 'synthetic data' generation, but it does not provide specific details on training, validation, or test splits for any dataset. |
| Hardware Specification | Yes | The experiments creating Figure 1 were carried out on an M1 Macbook Air. |
| Software Dependencies | No | The paper mentions using 'Sci-kit learn (Pedregosa et al., 2011)' and 'GLOVE (Pennington et al., 2014)' but does not specify the version numbers of these software components. |
| Experiment Setup | Yes | Table 1: A summary of experimental details of Figure 1. ... Dataset Learning Rate (η) Dimension Noise Distribution Noise Details # Iterations ... The learning rates are taken optimal which is to say that they are ηd/ tr(K) and ηd/ tr(K) for SGD and SIGNSGD respectively for a fixed multiple η. The constant is given by η = 0.01. |