MetaOOD: Automatic Selection of OOD Detection Models
Authors: Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding, Yue Zhao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experimentation with 24 unique test dataset pairs to choose from among 11 OOD detection models, we demonstrate that Meta OOD significantly outperforms existing methods and only brings marginal time overhead. Our results, validated by Wilcoxon statistical tests, show that Meta OOD surpasses a diverse group of 11 baselines, including established OOD detectors and advanced unsupervised selection methods. |
| Researcher Affiliation | Academia | 1University of Southern California 2University of Chicago 3Carnegie Mellon University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | A.1 PSEUDO-CODE FOR META-TRAIN AND ONLINE MODEL SELECTION We discussed meta-training and online model selection in 3.3 and 3.4, respectively. Here is the pseudo-code for the two phases. Algorithm 1 Offline OOD detection meta-learner training Algorithm 2 Online OOD detection model selection |
| Open Source Code | Yes | Accessibility and Reproducibility. We release the testbed, corresponding code, and the proposed meta-learner at https://github.com/yqin43/metaood. |
| Open Datasets | Yes | ID Datasets: CIFAR10 (Krizhevsky, 2009), CIFAR100 (Krizhevsky, 2009), Image Net (Deng et al., 2009), Fashion MNIST (Xiao et al., 2017) Classic OOD Group: CIFAR10, CIFAR100, MNIST (Deng, 2012), Places365 (Zhou et al., 2018), SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), TIN (Le & Yang, 2015) Large-Scale OOD Group: SSB hard (Vaze et al., 2022), NINCO (Bitterwolf et al., 2023), i Naturalist (Horn et al., 2017), Textures (Cimpoi et al., 2014), Open Image-O (Wang et al., 2022) |
| Dataset Splits | Yes | We utilize the train-test split of datasets preprocessed as described in (Yang et al., 2022). To summarize, we create our ID-OOD dataset pairs using the following datasets: 1. ID Datasets: CIFAR10 (Krizhevsky, 2009), CIFAR100 (Krizhevsky, 2009), Image Net (Deng et al., 2009), Fashion MNIST (Xiao et al., 2017) ... We construct the ID-OOD dataset pair, and set the training and testing set as follows: (i) Training: CIFAR10 from ID and OOD from the classic OOD group shown above; and (ii) Testing: CIFAR100, Image Net, and Fashion MNIST from ID, and OOD from large-scale OOD dataset group. |
| Hardware Specification | Yes | Hardware. For consistency, all models are built using the pytorch-ood library (Kirchheim et al., 2022) on NVIDIA RTX 6000 Ada, 48 GB RAM workstations. |
| Software Dependencies | No | The paper mentions 'pytorch-ood library' and 'XGBoost model' and 'BERT-based all-mpnet-base-v2 model by Hugging Face' but does not provide specific version numbers for these software components or libraries, which is required for reproducibility. |
| Experiment Setup | Yes | Section B.1 PROMPTS TO LLM FOR ZERO-SHOT SELECTION OF THE OPTIMAL OOD DETECTOR: ...To ensure consistency, we set temperature parameter to 0, and top p parameter to 0.999. |