A Note On The Stability Of The Focal Loss

Authors: Martijn P. van Leeuwen, Koen V. Haak, Gorkem Saygili, Eric O. Postma, L.L. Sharon Ong

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we highlight an unaddressed numerical instability of the Focal Loss that arises when this focusing parameter is set to a value between 0 and 1. We present the theoretical basis of this numerical instability, show that it can be detected in the computation of Focal Loss gradients, and demonstrate its effects across several classification and segmentation tasks. Additionally, we propose a straightforward modification to the original Focal Loss to ensure stability whenever these unstable focusing parameter values are used. To demonstrate the instability of the Focal Loss, we conducted several experiments.
Researcher Affiliation Academia Martijn P. van Leeuwen EMAIL Dept. of Intelligent Systems, Research Center for Cognitive Science and Artificial Intelligence, Tilburg School of Humanities and Digital Sciences, Tilburg University
Pseudocode No The paper includes mathematical derivations and a Python implementation of the modified Focal Loss in Appendix A.3, but no clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes A.3 Modified Focal Loss In this Appendix, we show the Python implementation of the modified version of the original Focal loss Lin et al. (2017). Modifications to the original code are indicated by the "#Modification" comment.
Open Datasets Yes First, we tested whether the instability could occur when training a basic convolutional neural network (CNN) (shown in Appendix A.4) to perform a binary classification task on the MNIST dataset (Deng, 2012). We then examined if this instability could be induced on a larger and more complex dataset, the CIFAR-10 dataset (Krizhevsky et al., 2009)
Dataset Splits Yes In the first experiment, we divided the MNIST dataset (Deng, 2012)... into two classes, where a threshold determined which numbers belonged to which class. ... Table 1: The number of samples in each class of the MNIST dataset (Deng, 2012) when using different class distributions. The class distribution was determined by a binarization threshold ranging from 0 to 8. ... The CIFAR-10 dataset (Krizhevsky et al., 2009) ... the classes were partitioned into two classes: animals and vehicles. Each class of the original CIFAR-10 classes contained 5000 images, and with 6 animal and 4 vehicle classes, the classes for the restructured dataset became slightly imbalanced (6:4 ratio). ... We trained on the complete training dataset, as we did not perform any hyperparameter optimization or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using PyTorch and torchvision.utils in the code snippet in Appendix A.3, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes In this experiment, we tested stable (0,1,2,3,4,5) and unstable γ values (0 < γ < 1)... Both models were trained for 1000 epochs with a batch size of 128 and a γ and αt of 0.5 without using pre-trained weights. ...We ran the experiments in this paper with a value of ϵ equal to 1e 3