reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Note On The Stability Of The Focal Loss

Authors: Martijn P. van Leeuwen, Koen V. Haak, Gorkem Saygili, Eric O. Postma, L.L. Sharon Ong

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we highlight an unaddressed numerical instability of the Focal Loss that arises when this focusing parameter is set to a value between 0 and 1. We present the theoretical basis of this numerical instability, show that it can be detected in the computation of Focal Loss gradients, and demonstrate its effects across several classification and segmentation tasks. Additionally, we propose a straightforward modification to the original Focal Loss to ensure stability whenever these unstable focusing parameter values are used. To demonstrate the instability of the Focal Loss, we conducted several experiments.
Researcher Affiliation	Academia	Martijn P. van Leeuwen EMAIL Dept. of Intelligent Systems, Research Center for Cognitive Science and Artificial Intelligence, Tilburg School of Humanities and Digital Sciences, Tilburg University
Pseudocode	No	The paper includes mathematical derivations and a Python implementation of the modified Focal Loss in Appendix A.3, but no clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	A.3 Modified Focal Loss In this Appendix, we show the Python implementation of the modified version of the original Focal loss Lin et al. (2017). Modifications to the original code are indicated by the "#Modification" comment.
Open Datasets	Yes	First, we tested whether the instability could occur when training a basic convolutional neural network (CNN) (shown in Appendix A.4) to perform a binary classification task on the MNIST dataset (Deng, 2012). We then examined if this instability could be induced on a larger and more complex dataset, the CIFAR-10 dataset (Krizhevsky et al., 2009)
Dataset Splits	Yes	In the first experiment, we divided the MNIST dataset (Deng, 2012)... into two classes, where a threshold determined which numbers belonged to which class. ... Table 1: The number of samples in each class of the MNIST dataset (Deng, 2012) when using different class distributions. The class distribution was determined by a binarization threshold ranging from 0 to 8. ... The CIFAR-10 dataset (Krizhevsky et al., 2009) ... the classes were partitioned into two classes: animals and vehicles. Each class of the original CIFAR-10 classes contained 5000 images, and with 6 animal and 4 vehicle classes, the classes for the restructured dataset became slightly imbalanced (6:4 ratio). ... We trained on the complete training dataset, as we did not perform any hyperparameter optimization or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using PyTorch and torchvision.utils in the code snippet in Appendix A.3, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	In this experiment, we tested stable (0,1,2,3,4,5) and unstable γ values (0 < γ < 1)... Both models were trained for 1000 epochs with a batch size of 128 and a γ and αt of 0.5 without using pre-trained weights. ...We ran the experiments in this paper with a value of ϵ equal to 1e 3