From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
Authors: Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč, Eric Moulines, Maxim Panov
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on image datasets by evaluating its performance in detecting out-of-distribution and misclassified instances using the AUROC metric. The experimental results confirm that the measures derived from our framework are useful for the considered downstream tasks. [...] We experimentally evaluate different predictive uncertainty quantification measures from the proposed framework in various tasks. Specifically, we consider out-of-distribution detection and misclassification detection; see Section 6. |
| Researcher Affiliation | Academia | Nikita Kotelevskii1,2 Vladimir Kondratyev3 Martin Takáˇc1 Éric Moulines3,1 Maxim Panov1 1Department of Machine Learning, MBZUAI, UAE 2CAIT, Skoltech, Russia 3CMAP, École polytechnique, France |
| Pseudocode | No | The paper does not contain any explicit sections or figures labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | 1The source code is publicly available at https://github.com/stat-ml/uncertainty_from_ proper_scoring_rules/. |
| Open Datasets | Yes | As training (in-distribution) datasets, we consider CIFAR10, CIFAR100 (Krizhevsky, 2009), and Tiny Image Net (Le & Yang, 2015). |
| Dataset Splits | No | The paper mentions using CIFAR10, CIFAR100, and Tiny Image Net, as well as their noisy versions (CIFAR10-N, CIFAR100-N) and out-of-distribution variants (CIFAR10C, Image Net-O, Image Net-A, Image Net-R). It specifies that original versions are used for misclassification detection. However, it does not explicitly provide percentages, sample counts, or specific methodology for training/validation/test splits for any of these datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions using code from repositories like 'https://github.com/kuangliu/pytorch-cifar' and 'https://github.com/weiaicunzai/pytorch-cifar100', and pre-trained models from 'https://github.com/ENSTA-U2IS-AI/torch-uncertainty'. While these imply the use of PyTorch, specific version numbers for PyTorch, Python, or other libraries are not provided. |
| Experiment Setup | Yes | We used Res Net18 (He et al., 2016) as the architecture (additional details can be found in Appendix H). [...] The training procedure consisted of 200 epochs with a cosine annealing learning rate. For an optimizer, we use SGD with momentum and weight decay. [...] For CIFAR100-based datasets, we used code from this repository: https://github.com/weiaicunzai/pytorch-cifar100. The training procedure consisted of 200 epochs with learning rate decay at particular milestones: [60, 120, 160]. For an optimizer, we use SGD with momentum and weight decay. |