reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Closer Look at Generalized BH Algorithm for Out-of-Distribution Detection

Authors: Xinsong Ma, Jie Wu, Weiwei Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, extensive experimental results validate the effectiveness of our theoretical findings and demonstrate the superiority of our method over g-BH algorithm on small calibrated set.
Researcher Affiliation	Academia	1School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China. Correspondence to: Weiwei Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 eg-BH algorithm
Open Source Code	No	We mainly follow the experimental implementation in (Yang et al., 2022; Zhang et al., 2023a), and our codes are based on (Zhang et al., 2023a).
Open Datasets	Yes	We use CIFAR-10 (Krizhevsky et al., 2009) as ID data, and use CIFAR-100, Image Net (Krizhevsky et al., 2017), SVHN (Netzer et al., 2011), Fashion-MNIST (F-MNIST) (Xiao et al., 2017), Places365 (Zhou et al., 2018) and MNIST (Deng, 2012), as OOD data.
Dataset Splits	Yes	We first split the training data equally into two parts. One part is employed to train the neural networks for constructing the score function, and the other serves as the largest calibrated set T cal M . Then, from T cal M , we extract samples at various proportions r to construct several relatively smaller calibrated sets, where r = {0.2, 0.3, ..., 1.0}.
Hardware Specification	No	The paper mentions using Res Net18 and Wide Res Net as models, but no specific hardware such as GPU or CPU models are mentioned for running experiments.
Software Dependencies	No	We mainly follow the experimental implementation in (Yang et al., 2022; Zhang et al., 2023a), and our codes are based on (Zhang et al., 2023a). However, specific versions of software dependencies like Python, PyTorch, or CUDA are not listed.
Experiment Setup	Yes	We choose two famous methods MSP(Hendrycks & Gimpel, 2017) and Energy(Liu et al., 2020) as the score functions in our method. Model. The score functions in this paper are based on the Res Net18 and Wide Res Net, respectively. To assess the impact of L on both the g-BH algorithm and eg-BH algorithm. we set L = {6, 7, 8, 9, 10} and conduct the corresponding experiments using Energy as score function based on the Res Net18. We first split the training data equally into two parts... from T cal M , we extract samples at various proportions r to construct several relatively smaller calibrated sets, where r = {0.2, 0.3, ..., 1.0}.