Optimizing Estimators of Squared Calibration Errors in Classification
Authors: Sebastian Gregor Gruber, Francis R. Bach
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our pipeline by optimizing existing calibration estimators and comparing them with novel kernel ridge regression-based estimators on real-world image classification tasks. [...] Section 5 Experiments |
| Researcher Affiliation | Academia | Sebastian G. Gruber EMAIL German Cancer Consortium (DKTK), partner site Frankfurt/Mainz, a partnership between DKFZ and UCT Frankfurt-Marburg, Germany, Frankfurt am Main, Germany German Cancer Research Center (DKFZ), Heidelberg, Germany Goethe University Frankfurt, Germany. Francis Bach Inria, Ecole Normale Supérieure, PSL Research University, Paris, France |
| Pseudocode | Yes | Algorithm 1 Evaluating the calibration of a given classifier and dataset by optimizing the calibration estimator. The evaluation dataset is split into a holdout set for estimating the calibration error, and another set, which is used for optimizing the calibration estimator via cross-validation. |
| Open Source Code | Yes | The source code is publicly available at https://github.com/Seb GGruber/Optimizing_ Calibration_Estimators. |
| Open Datasets | Yes | The image classification datasets in use are CIFAR10 with 10 classes, CIFAR100 with 100 classes (Krizhevsky, 2009), and Image Net with 1,000 classes (Deng et al., 2009). [...] We train the Vision Transformer architecture (Dosovitskiy et al., 2020) on the Med MNIST datasets (Yang et al., 2021). |
| Dataset Splits | Yes | We run the calibration-evaluation pipeline proposed in Section 3.2.2 with a random split of the original test set, using 80% for tuning the calibration estimator function via cross-validation and 20% for the calibration test set Dte, which computes the mean in Equation (21). In all experiments, we use 5-fold cross-validation to optimize the hyperparameters of a calibration estimator function. |
| Hardware Specification | Yes | All experiments are run on an Intel(R) Xeon(R) Gold 5218R with 2.1 GHz and a Macbook Pro M1. |
| Software Dependencies | No | The paper mentions using an 'implementation of the calibration estimator function hkde given by the original authors (Popordanoska et al., 2022b)' and 'a pre-trained classifier from Huggingface2 and fine-tune with a modification of (Capelle, 2022)'. However, it does not provide specific version numbers for these or other software libraries used in their own methodology. |
| Experiment Setup | Yes | As hyperparameter search spaces for the TCE experiments, we consider {5i i = 1, . . . , 20} for the number of bins in hbin, a bandwidth in {10 5(i 1)/14 (1 (i 1)/14)) i = 1, . . . , 15} {0.2i i = 1, . . . , 5} for the Dirichlet kernel of hkde according to Popordanoska et al. (2022a), a regularization constant λ {n0.510 2i+1 i = 1, . . . , 9} for hkkr, and λ {n0.510 i i = 1, . . . , 9} for hukkr. For the CCE experiments, we consider the same set of bandwidths for the Dirichlet kernel of hkde, a regularization constant λ {n0.510 i+9 i = 1, . . . , 18} for hkkr, and λ {n0.510 0.5i+4.5 i = 1, . . . , 18} for hukkr. |