Sharpness-Aware Minimization: General Analysis and Improved Rates
Authors: Dimitris Oikonomou, Nicolas Loizou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments validate the theoretical findings and further demonstrate the practical effectiveness of Unified SAM in training deep neural networks for image classification tasks. In Section 4, we present extensive experiments validating different aspects of our theoretical results (behavior of methods in the deterministic setting, importance sampling, and different step-size selections). We also assess the performance of Unified SAM in training deep neural networks for multi-class image classification problems. |
| Researcher Affiliation | Academia | Dimitris Oikonomou CS & MINDS Johns Hopkins University EMAIL Nicolas Loizou AMS & MINDS Johns Hopkins University EMAIL |
| Pseudocode | No | The paper describes the update rules using mathematical formulas and text, such as: "xt+1 = xt γt f St xt + ρt f St(xt) f St(xt)" for SAM, "xt+1 = xt γt f St xt + ρt f St(xt)" for USAM, and "xt+1 = xt γt f St 1 λt + λt f St(xt) f St(xt)" for Unified SAM, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | An open-source implementation of our method is available at https://github.com/dimitris-oik/unifiedsam. |
| Open Datasets | Yes | The models are trained on the CIFAR-10 and CIFAR-100 datasets, (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | The models are trained on the CIFAR-10 and CIFAR-100 datasets, (Krizhevsky et al., 2009). |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA RTX 6000 Ada GPUs. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are explicitly mentioned in the main text. |
| Experiment Setup | Yes | The Unified SAM method is trained using ρ = 0.1, 0.2, 0.3, 0.4 and λt = 0.0, 0.5, 1.0, 1/t, 1 1/t. Following Pang et al. (2021) and Zhang et al. (2024), we set the weight decay to 5 10 4, the momentum to 0.9, and train for 100 epochs. The step size γ is initialized at 0.1 and reduced by a factor of 10 at the 75-th and 90-th epochs. All models are trained for 200 epochs with a batch size of 128. A cosine scheduler is employed in all cases, with an initial step size of 0.05. The weight decay is set to 0.001. For Va SSO, we use θ = 0.4, as this value provides the best accuracy according to Li & Giannakis (2023). For the CIFAR-10 dataset, we set ρ = 0.1, while for CIFAR-100, we use ρ = 0.2. |