reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MetaOOD: Automatic Selection of OOD Detection Models

Authors: Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding, Yue Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experimentation with 24 unique test dataset pairs to choose from among 11 OOD detection models, we demonstrate that Meta OOD significantly outperforms existing methods and only brings marginal time overhead. Our results, validated by Wilcoxon statistical tests, show that Meta OOD surpasses a diverse group of 11 baselines, including established OOD detectors and advanced unsupervised selection methods.
Researcher Affiliation	Academia	1University of Southern California 2University of Chicago 3Carnegie Mellon University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	A.1 PSEUDO-CODE FOR META-TRAIN AND ONLINE MODEL SELECTION We discussed meta-training and online model selection in 3.3 and 3.4, respectively. Here is the pseudo-code for the two phases. Algorithm 1 Offline OOD detection meta-learner training Algorithm 2 Online OOD detection model selection
Open Source Code	Yes	Accessibility and Reproducibility. We release the testbed, corresponding code, and the proposed meta-learner at https://github.com/yqin43/metaood.
Open Datasets	Yes	ID Datasets: CIFAR10 (Krizhevsky, 2009), CIFAR100 (Krizhevsky, 2009), Image Net (Deng et al., 2009), Fashion MNIST (Xiao et al., 2017) Classic OOD Group: CIFAR10, CIFAR100, MNIST (Deng, 2012), Places365 (Zhou et al., 2018), SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), TIN (Le & Yang, 2015) Large-Scale OOD Group: SSB hard (Vaze et al., 2022), NINCO (Bitterwolf et al., 2023), i Naturalist (Horn et al., 2017), Textures (Cimpoi et al., 2014), Open Image-O (Wang et al., 2022)
Dataset Splits	Yes	We utilize the train-test split of datasets preprocessed as described in (Yang et al., 2022). To summarize, we create our ID-OOD dataset pairs using the following datasets: 1. ID Datasets: CIFAR10 (Krizhevsky, 2009), CIFAR100 (Krizhevsky, 2009), Image Net (Deng et al., 2009), Fashion MNIST (Xiao et al., 2017) ... We construct the ID-OOD dataset pair, and set the training and testing set as follows: (i) Training: CIFAR10 from ID and OOD from the classic OOD group shown above; and (ii) Testing: CIFAR100, Image Net, and Fashion MNIST from ID, and OOD from large-scale OOD dataset group.
Hardware Specification	Yes	Hardware. For consistency, all models are built using the pytorch-ood library (Kirchheim et al., 2022) on NVIDIA RTX 6000 Ada, 48 GB RAM workstations.
Software Dependencies	No	The paper mentions 'pytorch-ood library' and 'XGBoost model' and 'BERT-based all-mpnet-base-v2 model by Hugging Face' but does not provide specific version numbers for these software components or libraries, which is required for reproducibility.
Experiment Setup	Yes	Section B.1 PROMPTS TO LLM FOR ZERO-SHOT SELECTION OF THE OPTIMAL OOD DETECTOR: ...To ensure consistency, we set temperature parameter to 0, and top p parameter to 0.999.