Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Authors: Dahun Shin, Dongyeop Lee, Jinseok Chung, Namhoon Lee
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate the effectiveness of SASSHA across diverse vision and natural language tasks. Our results reveal that SASSHA consistently achieves flatter minima and attains stronger generalization performance, all compared to existing practical second-order methods, and interestingly, to first-order methods including SGD, Adam W, and SAM. |
| Researcher Affiliation | Academia | 1POSTECH. Correspondence to: Dahun Shin <EMAIL>, Dongyeop Lee <EMAIL>. |
| Pseudocode | Yes | The exact steps of SASSHA is outlined in Algorithm 1. |
| Open Source Code | Yes | The code to reproduce all results reported in this work is made available for download at https://github.com/LOG-postech/Sassha. |
| Open Datasets | Yes | We first evaluate SASSHA for image classification on CIFAR-10, CIFAR-100, and Image Net. [...] Specifically, we train GPT1-mini, a scaled-down variant of GPT1 (Radford et al., 2019), on Wikitext-2 dataset (Merity et al., 2022) using various methods [...]. We also extend our evaluation to finetuning tasks. Specifically, we finetune Squeeze BERT (Iandola et al., 2020) for diverse tasks in the GLUE benchmark (Wang et al., 2018). |
| Dataset Splits | Yes | We first evaluate SASSHA for image classification on CIFAR-10, CIFAR-100, and Image Net. [...] We introduced label noise by randomly corrupting a fraction of the training data at rates of 20%, 40%, and 60%. The use of standard benchmark datasets like CIFAR-10/100 and ImageNet implies the use of their well-known, pre-defined train/test/validation splits, which are standard in the field. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It describes experimental settings in terms of datasets, models, and hyperparameters, but omits hardware specifications. |
| Software Dependencies | No | The paper describes experimental settings in detail but does not provide specific ancillary software details, such as library names with version numbers (e.g., PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | Here, we describe our experiment settings in detail. We evaluate SASSHA against Ada Hessian (Yao et al., 2021), Sophia-H (Liu et al., 2024), Shampoo (Gupta et al., 2018), SGD, Adam W (Loshchilov & Hutter, 2018), and SAM (Foret et al., 2021) across a diverse set of vision and language tasks. Across all evaluations except for language finetuning, we set lazy Hessian update interval to k = 10 for SASSHA. [...] All experiments were conducted with a batch size of 256. The hyperparameter search space for each method is detailed in Table 9. |