Position: Supervised Classifiers Answer the Wrong Questions for OOD Detection

Authors: Yucen Lily Li, Daohan Lu, Polina Kirichenko, Shikai Qiu, Tim G. J. Rudner, C. Bayan Bruss, Andrew Gordon Wilson

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate this lack of separability between ID and OOD features, we study four different models trained on Image Net-1k: Res Net-18, Res Net-50, Vi T-S/16, and Vi TB/16, with the OOD datasets of Image Net-OOD (Yang et al., 2024b), Textures (Cimpoi et al., 2014), and i Naturalist (Van Horn et al., 2018). For each setting, we train an Oracle, a binary linear classifier, to differentiate between examples of ID features and OOD features and report its performance on held-out ID and OOD features.
Researcher Affiliation Collaboration 1New York University 2Capital One. Correspondence to: Yucen Lily Li <EMAIL>, Andrew Gordon Wilson <EMAIL>.
Pseudocode No The paper describes methods and analyses but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce our experiments can be found at https://github.com/yucenli/ood-pathologies.
Open Datasets Yes To demonstrate this lack of separability between ID and OOD features, we study four different models trained on Image Net-1k: Res Net-18, Res Net-50, Vi T-S/16, and Vi TB/16, with the OOD datasets of Image Net-OOD (Yang et al., 2024b), Textures (Cimpoi et al., 2014), and i Naturalist (Van Horn et al., 2018).
Dataset Splits No The paper frequently refers to 'training data', 'test data', 'in-distribution data', and 'OOD data', and mentions using subsets or specific classes from datasets (e.g., 'To evaluate the model on STL-10, we only use the 9 classes which overlap with CIFAR-10 classes'), but it does not provide explicit numerical dataset splits (e.g., percentages, exact counts for train/test/validation) for reproducibility beyond using standard benchmark datasets.
Hardware Specification No The paper mentions 'NYU IT High Performance Computing resources, services, and staff expertise.' However, it does not specify any particular hardware components such as GPU or CPU models, or memory details used for the experiments.
Software Dependencies No The paper mentions adapting the 'Open OOD codebase (Zhang et al., 2023; Yang et al., 2022)' but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We train models for 100 epochs with batch size 128 for ID data and batch size 256 for the outlier dataset, SGD with momentum and initial learning rate 0.1 and weight decay 5 10 4, and we set the coefficient before the OE loss to α = 0.5 (overall, we use standard training hyper-parameters as in Zhang et al. (2023)).