Assuming Locally Equal Calibration Errors for Non-Parametric Multiclass Calibration
Authors: Kaspar Valk, Meelis Kull
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, experiments are carried out to compare the proposed method to its competitors. The source code of the experiments is available at https: //github.com/kaspar98/lece-calibration. ... to offer improvements to the state-of-the-art according to the expected calibration error metric on CIFAR-10 and CIFAR-100 datasets. |
| Researcher Affiliation | Academia | Meelis Kull EMAIL University of Tartu. This paper builds upon preliminary research conducted for the author s master s thesis (Valk, 2022). |
| Pseudocode | Yes | Algorithm 1: LECE calibration method Input : predictions on the validation set ˆp1, . . . , ˆpn validation set labels y1, . . . , yn prediction to calibrate ˆp neighborhood size k distance function d threshold t number of classes m Output: calibrated prediction ˆc(ˆp) |
| Open Source Code | Yes | The source code of the experiments is available at https: //github.com/kaspar98/lece-calibration. |
| Open Datasets | Yes | CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) are used for the experiments. |
| Dataset Splits | Yes | An overview of the used datasets, classifiers, and dataset sizes is given in Table 2. ... Table 2: Datasets and model details for the real experiments. ... CIFAR-10 ... Training 45000 Validation 5000 Test 10000 ... CIFAR-100 ... Training 45000 Validation 5000 Test 10000 |
| Hardware Specification | Yes | The real data experiments were implemented in Python and run on a machine with 16 GBs of RAM and a CPU with clock speed 3.7 GHz. |
| Software Dependencies | No | The real data experiments were implemented in Python and run on a machine with 16 GBs of RAM and a CPU with clock speed 3.7 GHz. The paper mentions Python as the implementation language but does not specify any version numbers for Python or other libraries. |
| Experiment Setup | Yes | optimal hyperparameters were found with 10-fold cross-validation grid search optimized by log-loss from neighborhood size proportions q of the training dataset q = k/n {0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.1, 0.2, 1.0}, and threshold values t {0, 0.00125, 0.0025, 0.005, 0.01, 0.02, 0.04, 0.05, 0.10, 1.0}. |