reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

Authors: Kaizheng Wang, Fabio Cuzzolin, Keivan Shariatmadar, David Moens, Hans Hallez

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this article, we conduct extensive experiments on several out-of-distribution (OOD) detection benchmarks, encompassing various dataset pairs (CIFAR10/100 vs SVHN/Tiny-Image Net, CIFAR10 vs CIFAR10-C, CIFAR100 vs CIFAR100-C and Image Net vs Image Net O) and using different network architectures (such as VGG16, Res Net-18/50, Efficient Net B2, and Vi T Base). Compared to the BNN and DE baselines, the proposed credal wrapper method exhibits superior performance in uncertainty estimation and achieves a lower expected calibration error on corrupted data.
Researcher Affiliation	Academia	1Distri Net, Department of Computer Science, KU Leuven, Belgium 2LMSD, Department of Mechanical Engineering, KU Leuven, Belgium 3School of Engineering, Computing and Mathematics, Oxford Brookes University, UK 4M-Group, KU Leuven, Belgium 5Flanders Make@KU Leuven, Belgium
Pseudocode	Yes	Algorithm 1 Probability Interval Approximation Input: [p Lk, p Uk] for k=1, ..C; Intersection probability p ; Chosen number of classes J Output: Approximated probability intervals [r Lj, r Uj] for k=1, ..J 1: Compute the index vector for sorting p in descending order as m:=(m1, m2, ..., m C) for p m1 p m2 ... p m C 2: Calculate the upper and lower probability per selected class as r Lj :=p Lmj , r Uj :=p Umj for j =1, ..., J 1 3: Calculate upper and lower probability for deselected classes as r LJ =max(1 Pm C i=m Jp Ui, PJ 1 j=1 r Lj); r UJ =min(1 Pm C i=m Jp Li, PJ 1 j=1 r Uj)
Open Source Code	Yes	REPRODUCIBILITY STATEMENT Detailed implementation description is provided in B. The experiment codes are provided here.
Open Datasets	Yes	In this experiment, we use dataset pairs of the type (ID samples vs OOD data), including CIFAR10 (Krizhevsky et al., 2009) vs SVHN (Hendrycks et al., 2021)/Tiny-Image Net (Le & Yang, 2015) and CIFAR10 vs CIFAR10-C (Hendrycks & Dietterich, 2019). ...The dataset pairs (ID vs OOD) considered include CIFAR10/CIFAR100 (Krizhevsky, 2012) vs SVHN/Tiny Image Net, Image Net (Deng et al., 2009) vs Image Net-O (Hendrycks et al., 2021), CIFAR10 vs CIFAR10-C, and CIFAR100 vs CIFAR100-C (Hendrycks & Dietterich, 2019).
Dataset Splits	Yes	The device is a single Tesla P100-SXM2-16GB GPU. The standard data split is applied. In terms of EDDs, for each learning rate scheduler configuration, as discussed in Sec. 4, we train 15 models for 100 epochs, one from each distinct DE. All other training configurations, such as data splitting and augmentation, are kept consistent with those used for the other models. ...Standard data split is applied to all training processes.
Hardware Specification	Yes	The device is a single Tesla P100-SXM2-16GB GPU. ...we mainly use two Tesla P100-SXM2-16GB GPUs as devices to contract deep ensembles by independently training 15 standard neural networks (SNNs) under different random seeds. ...Note that models based on Vi T-B backbones are trained on one single Nvidia A100-SXM4-40GB GPU. In the Image Net experiments, we employ one single Nvidia A100SXM4-40GB GPU to retrain SNNs... The reported time in Table 4 cost is measured on a single Intel Xeon Gold 8358 CPU@2.6 GHz...
Software Dependencies	Yes	H(P)=maximize PC k pk log2 pk s.t. PC k pk =1 and pk [p Lk, p Uk] k=1, ..., C, (8) which can be addressed by using a standard solver, such as the Sci Py optimization package (Virtanen et al., 2020).
Experiment Setup	Yes	The Adam optimizer is applied with a learning rate scheduler, initialized at 0.001, and subjected to a 0.1 reduction at epochs 80 and 120. Standard neural networks (SNNs) and BNNs are trained for 100 and 150 epochs, respectively. Standard data augmentation is uniformly implemented across all methodologies to enhance the training performance quality of training data and training performance. The training batch size is set as 128. ...The Adam optimizer is employed, with a learning rate scheduler set at 0.001 and reduced to 0.0001 during the final 5 training epochs. Concerning the CIFAR10 dataset, SNNs are trained using 15, 15, and 25 epochs for Res Net-50, Eff B2, and Vi T-B backbones, respectively. As for the CIFAR100 dataset, SNNs are trained using 20, 20, and 25 epochs for Res Net-50, Eff B2, and Vi TB backbones, respectively.