reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ExCeL: Combined Extreme and Collective Logit Information for Out-of-Distribution Detection

Authors: Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on CIFAR100, Image Net-200, and Image Net-1K datasets demonstrate that Ex Ce L consistently is among the five top-performing methods out of twenty-one existing post-hoc baselines when the joint performance on near-OOD and far-OOD is considered (i.e., in terms of AUROC and FPR95).
Researcher Affiliation	Academia	Naveen Karunanayake EMAIL School of Computer Science The University of Sydney Suranga Seneviratne EMAIL School of Computer Science The University of Sydney Sanjay Chawla EMAIL Qatar Computing Research Institute, HBKU
Pseudocode	No	The paper describes the Ex Ce L score computation in four steps with mathematical equations (Equation 2, 4, 5, 7, 8) and prose, but does not present a dedicated, clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	We intend to include Ex Ce L in the Open OOD benchmark so that our results can be reproduced and compared with future work.
Open Datasets	Yes	We use CIFAR100 (Krizhevsky et al., 2009), Image Net-200 (a.k.a., Tiny Image Net) (Le & Yang, 2015), and Image Net-1K (Deng et al., 2009) as ID data in our experiments. ... For CIFAR100, CIFAR10 and Tiny Image Net datasets serve as near-OOD, while MNIST (Deng, 2012), SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), and Places365 (Zhou et al., 2017) are considered as far-OOD. Similarly, for both Tiny Image Net and Image Net-1K, SSB-hard (Vaze et al., 2021) and NINCO (Bitterwolf et al., 2023) datasets are used as near-OOD, while i Naturalist (Van Horn et al., 2018), Textures (Cimpoi et al., 2014), and Open Image-O (Wang et al., 2022) datasets are used as far-OOD.
Dataset Splits	Yes	For consistency, we adopt the same train, validation, and test splits used by the Open OOD benchmark in implementing our method. ... We use the validation set to determine these hyperparameters. ... We fine-tune α using a validation set following the same approach used in Section 4.2.2 to fine-tune a and b.
Hardware Specification	No	The paper mentions training models (Res Net-18, Res Net-50) but does not provide specific details about the hardware used for training or inference, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using 'Res Net-18', 'Res Net-50', 'SGD optimiser', and 'Torch Vision (maintainers & contributors, 2016)'. However, it does not provide specific version numbers for software components like Python, PyTorch, or any other libraries used.
Experiment Setup	Yes	Each model is trained for 100 epochs using the standard cross-entropy loss. We use the SGD optimiser with a momentum of 0.9, a learning rate of 0.1, and a cosine annealing decay schedule (Loshchilov & Hutter, 2016). Furthermore, we incorporate a weight decay of 0.0005, and employ batch sizes of 128 and 256 for CIFAR100 and Image Net-200, respectively. ... By performing a grid search on a, b, and α, we discovered the best hyperparameter combination for both CIFAR100 and Image Net-200 datasets, is a = 10, b = 5, and α = 0.8. For Image Net-1K, they were a = 10, b = 5, and α = 0.6.