Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification

Authors: Sicong Li, Qianqian Xu, Zhiyong Yang, Zitai Wang, Linchao Zhang, Xiaochun Cao, Qingming Huang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both traditional and foundation models validate the effectiveness of Focal-SAM. ... Finally, we conduct extensive experiments on various benchmark datasets to validate the effectiveness of Focal-SAM, including training Res Net models from scratch and finetuning the foundation model CLIP (Radford et al., 2021).
Researcher Affiliation Collaboration 1Institute of Information Engineering, CAS 2School of Cyber Security, University of Chinese Academy of Sciences 3Key Lab. of Intelligent Information Processing, Institute of Computing Tech., CAS 4School of Computer Science and Tech., University of Chinese Academy of Sciences 5Artificial Intelligence Institute of China Electronics Technology Group Corporation, 6School of Cyber Science and Tech., Shenzhen Campus of Sun Yat-sen University 7BDKM, University of Chinese Academy of Sciences. Correspondence to: Qianqian Xu <EMAIL>, Qingming Huang <EMAIL>.
Pseudocode Yes Overall, Alg.1 gives the pseudo-code to optimize the Focal SAM objective, using SGD as the base optimizer. Algorithm 1 Focal-SAM algorithm Input: Training set S, perturbation radius ρ, hyperparameter λ, γ, learning rate η Output: Model trained with Focal-SAM
Open Source Code No The paper does not provide any explicit statements about code availability, a direct link to a code repository, or mention of code in supplementary materials.
Open Datasets Yes Datasets. We use four widely adopted long-tailed datasets for long-tailed recognition tasks: CIFAR-10 LT (Cao et al., 2019), CIFAR-100 LT (Cao et al., 2019), Image Net-LT (Liu et al., 2019) and i Naturalist (Horn et al., 2018). ... Specifically, we train the model on Image Net-LT and evaluate it on three OOD datasets: Image Net-Sketch (Wang et al., 2019a), Image Net V2 (Recht et al., 2019), and Image Net-C (Hendrycks & Dietterich, 2019).
Dataset Splits Yes CIFAR-100 LT and CIFAR-10 LT (Cao et al., 2019). The original CIFAR-100 (Krizhevsky & Hinton, 2009) and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets contain 50,000 training images and 10,000 testing images for 100 and 10 classes, respectively. ... Image Net-LT (Liu et al., 2019). ... includes 115,846 training images and 50,000 test images. ... i Naturalist (Horn et al., 2018). ... The training set contains approximately 430,000 images, while the test set contains about 24,000 images.
Hardware Specification Yes C.5. Experimental Hardware Setup All the experiments are conducted on Ubuntu servers equipped with Nvidia(R) RTX 3090 GPUs and RTX 4090 GPUs. Fine-tuning the foundation models is performed using a single GPU for all datasets. The number of GPUs used for training the Res Net models from scratch varies based on dataset size: a single GPU for the CIFAT-LT datasets, 2 GPUs for the Image Net-LT dataset, and 4 GPUs for the i Naturalist dataset.
Software Dependencies No The paper mentions using 'SGD as the base optimizer' but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes C.4. Implementation Details ... We employ stochastic gradient descent (SGD) as the base optimizer, with an initial learning rate of 0.1, a batch size of 64, and a momentum of 0.9. Training spans 200 epochs, using a cosine annealing scheduler to reduce the learning rate from 0.1 to 0 gradually. ... For Image Net-LT, the initial learning rate is set to 0.1, with a batch size of 256, while for i Naturalist, the initial learning rate is 0.2, and the batch size is increased to 512. Training for these datasets also lasts 200 epochs with a cosine annealing scheduler. ... The initial learning rate is 0.01 for parameter-efficient fine-tuning and 0.001 for full fine-tuning. Unlike LIFT (Shi et al., 2024), all models in our experiments are fine-tuned for 20 epochs across datasets and methods.