Sharpness-Aware Minimization and the Edge of Stability
Authors: Philip M. Long, Peter L. Bartlett
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis. |
| Researcher Affiliation | Collaboration | Philip M. Long EMAIL Peter L. Bartlett EMAIL Google 1600 Amphitheatre Parkway Mountain View, CA 94040 . Also affiliated with University of California, Berkeley. |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. It provides mathematical derivations and descriptions of algorithms in prose. |
| Open Source Code | Yes | Code is available (Long and Bartlett, 2024). P. M. Long and P. L. Bartlett. Sam and the edge of stability. https://github.com/google-deepmind/sam_edge, 2024. |
| Open Datasets | Yes | Our first experiments are with fully connected networks on MNIST. Next, we experiment with a convolutional neural network training on 1000 examples from CIFAR10. Finally, we experiment with a standard Transformer architecture training a language model on tiny_shakespeare using the more practical version of SAM that uses stochastic gradients. |
| Dataset Splits | Yes | The last 10000 lines of tiny_shakespeare were set aside as a test set, and the remaining data was used for training. |
| Hardware Specification | Yes | We trained for eight hours of wallclock time on a V100 GPU. Training was performed for 12 hours on a V100 GPU. |
| Software Dependencies | Yes | We coded our experiments using Jax (Bradbury et al., 2018), along with Flax (Heek et al., 2023) (for the image classification experiments), and Haiku (Hennigan et al., 2020) (for the language model experiments). JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax. Version 0.3.13. Flax: A neural network library and ecosystem for JAX, 2023. URL http://github.com/google/flax. Version 0.7.2. Haiku: Sonnet for JAX, 2020. URL http://github.com/deepmind/dm-haiku. Version 0.0.10. |
| Experiment Setup | Yes | We trained once for each combination of the following hyperparameters: learning rates η: 0.03, 0.1, 0.3, SAM offsets ρ (see (1)): 0.0, 0.1, 0.3, 1.0. For CIFAR10, learning rates: 0.0003, 0.001, 0.003, 0.01, ρ values: 0.0, 0.1, 0.3, 1.0. For tiny_shakespeare, learning rates: 0.01, 0.02, 0.05, 0.1, 0.2, 0.5 ρ values: 0.0, 0.1, 0.3, 1.0. ...training an autoregressive character language model using the tiny_shakespeare dataset, using minibatches of size 128. |