Tilted Sharpness-Aware Minimization
Authors: Tian Li, Tianyi Zhou, Jeff Bilmes
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks. [...] We empirically demonstrate that TSAM results in flatter solutions and superior generalization performance than SAM and its variants for deep neural networks including transformers on both image and text datasets (Section 5). |
| Researcher Affiliation | Academia | 1University of Chicago 2University of Maryland, College Park 3University of Washington. Correspondence to: Tian Li <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Tilted SAM Solver [...] Algorithm 3 Sampling from eδL(θi+ϵ) where ϵ ρ |
| Open Source Code | Yes | Our code is publicly available at github.com/litian96/TSAM. |
| Open Datasets | Yes | First, we explore training Res Net18 (He et al., 2016) and Wide Res Net16-8 (Zagoruyko, 2016) on CIFAR100 (Krizhevsky et al., 2009). [...] We study the performance of finetuning Vi Ts (pretrained on Image Net (Deng et al., 2009)) on an out-of-distribution Describable Texture Dataset (DTD) (Cimpoi et al., 2014), where the task is 47-class classification. [...] Additionally, we evaluate a 200-class classification task for Tiny Imagenet (Le & Yang, 2015) with Res Net18 and Res Net34 (He et al., 2016) models. Lastly, for text data, we study finetuning a pretrained Distil BERT (Sanh, 2019) model on the GLUE benchmark including both classification and regression problems. |
| Dataset Splits | No | The paper mentions specific datasets (CIFAR100, DTD, ImageNet, Tiny Imagenet, GLUE benchmark) but does not explicitly provide details about the train/test/validation splits used for these datasets, such as percentages, sample counts, or references to specific predefined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or cloud computing specifications). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., specific library versions for PyTorch, TensorFlow, etc.) within the main text or appendix. |
| Experiment Setup | Yes | Hyperparameter Tuning. We take µ(ϵ) to be ϵ ρ for all TSAM experiments, and tune the ρ parameters separately from {0.05, 0.1, 0.2} for relevant methods. For TSAM, we tune t from {0, 1, 5, 20, 100} and select the best one based on the validation set. [...] We use s=3 or s=5 sampled ϵ s for all datasets and find that it works well. [...] The batch size is 64 for all the datasets and methods and a constant learning rate is tuned from {0.0003, 0.001, 0.003, 0.01, 0.03, 0.1} for each algorithm. |