From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

Authors: Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč, Eric Moulines, Maxim Panov

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on image datasets by evaluating its performance in detecting out-of-distribution and misclassified instances using the AUROC metric. The experimental results confirm that the measures derived from our framework are useful for the considered downstream tasks. [...] We experimentally evaluate different predictive uncertainty quantification measures from the proposed framework in various tasks. Specifically, we consider out-of-distribution detection and misclassification detection; see Section 6.
Researcher Affiliation Academia Nikita Kotelevskii1,2 Vladimir Kondratyev3 Martin Takáˇc1 Éric Moulines3,1 Maxim Panov1 1Department of Machine Learning, MBZUAI, UAE 2CAIT, Skoltech, Russia 3CMAP, École polytechnique, France
Pseudocode No The paper does not contain any explicit sections or figures labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes 1The source code is publicly available at https://github.com/stat-ml/uncertainty_from_ proper_scoring_rules/.
Open Datasets Yes As training (in-distribution) datasets, we consider CIFAR10, CIFAR100 (Krizhevsky, 2009), and Tiny Image Net (Le & Yang, 2015).
Dataset Splits No The paper mentions using CIFAR10, CIFAR100, and Tiny Image Net, as well as their noisy versions (CIFAR10-N, CIFAR100-N) and out-of-distribution variants (CIFAR10C, Image Net-O, Image Net-A, Image Net-R). It specifies that original versions are used for misclassification detection. However, it does not explicitly provide percentages, sample counts, or specific methodology for training/validation/test splits for any of these datasets.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions using code from repositories like 'https://github.com/kuangliu/pytorch-cifar' and 'https://github.com/weiaicunzai/pytorch-cifar100', and pre-trained models from 'https://github.com/ENSTA-U2IS-AI/torch-uncertainty'. While these imply the use of PyTorch, specific version numbers for PyTorch, Python, or other libraries are not provided.
Experiment Setup Yes We used Res Net18 (He et al., 2016) as the architecture (additional details can be found in Appendix H). [...] The training procedure consisted of 200 epochs with a cosine annealing learning rate. For an optimizer, we use SGD with momentum and weight decay. [...] For CIFAR100-based datasets, we used code from this repository: https://github.com/weiaicunzai/pytorch-cifar100. The training procedure consisted of 200 epochs with learning rate decay at particular milestones: [60, 120, 160]. For an optimizer, we use SGD with momentum and weight decay.