A Note On The Stability Of The Focal Loss
Authors: Martijn P. van Leeuwen, Koen V. Haak, Gorkem Saygili, Eric O. Postma, L.L. Sharon Ong
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we highlight an unaddressed numerical instability of the Focal Loss that arises when this focusing parameter is set to a value between 0 and 1. We present the theoretical basis of this numerical instability, show that it can be detected in the computation of Focal Loss gradients, and demonstrate its effects across several classification and segmentation tasks. Additionally, we propose a straightforward modification to the original Focal Loss to ensure stability whenever these unstable focusing parameter values are used. To demonstrate the instability of the Focal Loss, we conducted several experiments. |
| Researcher Affiliation | Academia | Martijn P. van Leeuwen EMAIL Dept. of Intelligent Systems, Research Center for Cognitive Science and Artificial Intelligence, Tilburg School of Humanities and Digital Sciences, Tilburg University |
| Pseudocode | No | The paper includes mathematical derivations and a Python implementation of the modified Focal Loss in Appendix A.3, but no clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | A.3 Modified Focal Loss In this Appendix, we show the Python implementation of the modified version of the original Focal loss Lin et al. (2017). Modifications to the original code are indicated by the "#Modification" comment. |
| Open Datasets | Yes | First, we tested whether the instability could occur when training a basic convolutional neural network (CNN) (shown in Appendix A.4) to perform a binary classification task on the MNIST dataset (Deng, 2012). We then examined if this instability could be induced on a larger and more complex dataset, the CIFAR-10 dataset (Krizhevsky et al., 2009) |
| Dataset Splits | Yes | In the first experiment, we divided the MNIST dataset (Deng, 2012)... into two classes, where a threshold determined which numbers belonged to which class. ... Table 1: The number of samples in each class of the MNIST dataset (Deng, 2012) when using different class distributions. The class distribution was determined by a binarization threshold ranging from 0 to 8. ... The CIFAR-10 dataset (Krizhevsky et al., 2009) ... the classes were partitioned into two classes: animals and vehicles. Each class of the original CIFAR-10 classes contained 5000 images, and with 6 animal and 4 vehicle classes, the classes for the restructured dataset became slightly imbalanced (6:4 ratio). ... We trained on the complete training dataset, as we did not perform any hyperparameter optimization or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using PyTorch and torchvision.utils in the code snippet in Appendix A.3, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | In this experiment, we tested stable (0,1,2,3,4,5) and unstable γ values (0 < γ < 1)... Both models were trained for 1000 epochs with a batch size of 128 and a γ and αt of 0.5 without using pre-trained weights. ...We ran the experiments in this paper with a value of ϵ equal to 1e 3 |