Tilted Sharpness-Aware Minimization

Authors: Tian Li, Tianyi Zhou, Jeff Bilmes

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks. [...] We empirically demonstrate that TSAM results in flatter solutions and superior generalization performance than SAM and its variants for deep neural networks including transformers on both image and text datasets (Section 5).
Researcher Affiliation Academia 1University of Chicago 2University of Maryland, College Park 3University of Washington. Correspondence to: Tian Li <EMAIL>.
Pseudocode Yes Algorithm 1 Tilted SAM Solver [...] Algorithm 3 Sampling from eδL(θi+ϵ) where ϵ ρ
Open Source Code Yes Our code is publicly available at github.com/litian96/TSAM.
Open Datasets Yes First, we explore training Res Net18 (He et al., 2016) and Wide Res Net16-8 (Zagoruyko, 2016) on CIFAR100 (Krizhevsky et al., 2009). [...] We study the performance of finetuning Vi Ts (pretrained on Image Net (Deng et al., 2009)) on an out-of-distribution Describable Texture Dataset (DTD) (Cimpoi et al., 2014), where the task is 47-class classification. [...] Additionally, we evaluate a 200-class classification task for Tiny Imagenet (Le & Yang, 2015) with Res Net18 and Res Net34 (He et al., 2016) models. Lastly, for text data, we study finetuning a pretrained Distil BERT (Sanh, 2019) model on the GLUE benchmark including both classification and regression problems.
Dataset Splits No The paper mentions specific datasets (CIFAR100, DTD, ImageNet, Tiny Imagenet, GLUE benchmark) but does not explicitly provide details about the train/test/validation splits used for these datasets, such as percentages, sample counts, or references to specific predefined splits.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or cloud computing specifications).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., specific library versions for PyTorch, TensorFlow, etc.) within the main text or appendix.
Experiment Setup Yes Hyperparameter Tuning. We take µ(ϵ) to be ϵ ρ for all TSAM experiments, and tune the ρ parameters separately from {0.05, 0.1, 0.2} for relevant methods. For TSAM, we tune t from {0, 1, 5, 20, 100} and select the best one based on the validation set. [...] We use s=3 or s=5 sampled ϵ s for all datasets and find that it works well. [...] The batch size is 64 for all the datasets and methods and a constant learning rate is tuned from {0.0003, 0.001, 0.003, 0.01, 0.03, 0.1} for each algorithm.