reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Online Statistical Framework for Out-of-Distribution Detection

Authors: Xinsong Ma, Xin Zou, Weiwei Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the extensive experimental results verify the effectiveness of g-LOND algorithm for OOD detection. ... The experimental results on CIFAR-100 as ID data are presented in Tables 1 and the Results on Image Net200 ad the ID data are presented in Table 2.
Researcher Affiliation	Academia	1School of Computer Science, National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, China. Correspondence to: Weiwei Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Practical g-LOND Algorithm 1: Input: Training set T , calibrated set T cal = {Xcal 1 , Xcal 2 , . . . , Xcal m } testing set T test = {Xtest 1 , Xtest 2 , . . . , Xtest n , }, prescribed level α (0, 1) and sequence {γi} i=1. 2: Train the score function on T : ˆs(x) = max i [K] fi f , where f is the logit Output. 3: Calculate the p-value corresponding to Xtest i : pi = p(Xtest i ) = \|{j [m] : ˆs(Xcal j ) ˆs(Xtest i )}\| + 1 m + 1 , (5) where ˆs( ) is a certain score function. 4: Compute the significance level corresponding to Xtest i : αi = αγi(D(i 1) + 1), 5: Output: Declare that Xtest i is OOD if f(pi) αi.
Open Source Code	No	We mainly follow the experimental settings in (Yang et al., 2022; Zhang et al., 2023b), and our codes are based on (Zhang et al., 2023b). More details can be found in Yang et al. (2022); Zhang et al. (2023b).
Open Datasets	Yes	We use CIFAR-100 (Krizhevsky, 2009) and Image Net-200 (Deng et al., 2009) as ID data. For CIFAR-100, we use CIFAR-10, Tiny Image Net (Krizhevsky et al., 2017), SVHN (Netzer et al., 2011), Texture (Kylberg, 2011), and Places365 (Zhou et al., 2018) as OOD data. For Image Net-200, we use SSB-hard (Vaze et al., 2022; Zhang et al., 2023b), NINCO (Bitterwolf et al., 2023), i Naturalist (Horn et al., 2018), Textures (Cimpoi et al., 2014), and Open Image-O (Wang et al., 2022) as OOD data.
Dataset Splits	No	The paper mentions a 'calibrated set' (Tcal) used in Algorithm 1 and refers to a 'validation set' in section 3.1 for threshold selection, but does not provide specific percentages, sample counts, or explicit methodology for splitting the datasets into training, validation, and test sets. It defers to external papers for more details, stating 'We mainly follow the experimental settings in (Yang et al., 2022; Zhang et al., 2023b).'
Hardware Specification	No	The paper mentions using 'Res Net18' and 'Res Net50' as backbones for the methods but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper states 'our codes are based on (Zhang et al., 2023b)' and mentions using 'Res Net18' and 'Res Net50' as backbones. However, it does not list any specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used for the implementation.
Experiment Setup	No	The paper describes using certain models (ResNet10, ResNet50) as backbones and lists baseline methods for comparison. However, it does not explicitly provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations (e.g., optimizer details) in the main text. It states 'More details can be found in Yang et al. (2022); Zhang et al. (2023b).'