reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification

Authors: Sicong Li, Qianqian Xu, Zhiyong Yang, Zitai Wang, Linchao Zhang, Xiaochun Cao, Qingming Huang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both traditional and foundation models validate the effectiveness of Focal-SAM. ... Finally, we conduct extensive experiments on various benchmark datasets to validate the effectiveness of Focal-SAM, including training Res Net models from scratch and finetuning the foundation model CLIP (Radford et al., 2021).
Researcher Affiliation	Collaboration	1Institute of Information Engineering, CAS 2School of Cyber Security, University of Chinese Academy of Sciences 3Key Lab. of Intelligent Information Processing, Institute of Computing Tech., CAS 4School of Computer Science and Tech., University of Chinese Academy of Sciences 5Artificial Intelligence Institute of China Electronics Technology Group Corporation, 6School of Cyber Science and Tech., Shenzhen Campus of Sun Yat-sen University 7BDKM, University of Chinese Academy of Sciences. Correspondence to: Qianqian Xu <EMAIL>, Qingming Huang <EMAIL>.
Pseudocode	Yes	Overall, Alg.1 gives the pseudo-code to optimize the Focal SAM objective, using SGD as the base optimizer. Algorithm 1 Focal-SAM algorithm Input: Training set S, perturbation radius ρ, hyperparameter λ, γ, learning rate η Output: Model trained with Focal-SAM
Open Source Code	No	The paper does not provide any explicit statements about code availability, a direct link to a code repository, or mention of code in supplementary materials.
Open Datasets	Yes	Datasets. We use four widely adopted long-tailed datasets for long-tailed recognition tasks: CIFAR-10 LT (Cao et al., 2019), CIFAR-100 LT (Cao et al., 2019), Image Net-LT (Liu et al., 2019) and i Naturalist (Horn et al., 2018). ... Specifically, we train the model on Image Net-LT and evaluate it on three OOD datasets: Image Net-Sketch (Wang et al., 2019a), Image Net V2 (Recht et al., 2019), and Image Net-C (Hendrycks & Dietterich, 2019).
Dataset Splits	Yes	CIFAR-100 LT and CIFAR-10 LT (Cao et al., 2019). The original CIFAR-100 (Krizhevsky & Hinton, 2009) and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets contain 50,000 training images and 10,000 testing images for 100 and 10 classes, respectively. ... Image Net-LT (Liu et al., 2019). ... includes 115,846 training images and 50,000 test images. ... i Naturalist (Horn et al., 2018). ... The training set contains approximately 430,000 images, while the test set contains about 24,000 images.
Hardware Specification	Yes	C.5. Experimental Hardware Setup All the experiments are conducted on Ubuntu servers equipped with Nvidia(R) RTX 3090 GPUs and RTX 4090 GPUs. Fine-tuning the foundation models is performed using a single GPU for all datasets. The number of GPUs used for training the Res Net models from scratch varies based on dataset size: a single GPU for the CIFAT-LT datasets, 2 GPUs for the Image Net-LT dataset, and 4 GPUs for the i Naturalist dataset.
Software Dependencies	No	The paper mentions using 'SGD as the base optimizer' but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	C.4. Implementation Details ... We employ stochastic gradient descent (SGD) as the base optimizer, with an initial learning rate of 0.1, a batch size of 64, and a momentum of 0.9. Training spans 200 epochs, using a cosine annealing scheduler to reduce the learning rate from 0.1 to 0 gradually. ... For Image Net-LT, the initial learning rate is set to 0.1, with a batch size of 256, while for i Naturalist, the initial learning rate is 0.2, and the batch size is increased to 512. Training for these datasets also lasts 200 epochs with a cosine annealing scheduler. ... The initial learning rate is 0.01 for parameter-efficient fine-tuning and 0.001 for full fine-tuning. Unlike LIFT (Shi et al., 2024), all models in our experiments are fine-tuned for 20 epochs across datasets and methods.