Position: Supervised Classifiers Answer the Wrong Questions for OOD Detection
Authors: Yucen Lily Li, Daohan Lu, Polina Kirichenko, Shikai Qiu, Tim G. J. Rudner, C. Bayan Bruss, Andrew Gordon Wilson
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate this lack of separability between ID and OOD features, we study four different models trained on Image Net-1k: Res Net-18, Res Net-50, Vi T-S/16, and Vi TB/16, with the OOD datasets of Image Net-OOD (Yang et al., 2024b), Textures (Cimpoi et al., 2014), and i Naturalist (Van Horn et al., 2018). For each setting, we train an Oracle, a binary linear classifier, to differentiate between examples of ID features and OOD features and report its performance on held-out ID and OOD features. |
| Researcher Affiliation | Collaboration | 1New York University 2Capital One. Correspondence to: Yucen Lily Li <EMAIL>, Andrew Gordon Wilson <EMAIL>. |
| Pseudocode | No | The paper describes methods and analyses but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce our experiments can be found at https://github.com/yucenli/ood-pathologies. |
| Open Datasets | Yes | To demonstrate this lack of separability between ID and OOD features, we study four different models trained on Image Net-1k: Res Net-18, Res Net-50, Vi T-S/16, and Vi TB/16, with the OOD datasets of Image Net-OOD (Yang et al., 2024b), Textures (Cimpoi et al., 2014), and i Naturalist (Van Horn et al., 2018). |
| Dataset Splits | No | The paper frequently refers to 'training data', 'test data', 'in-distribution data', and 'OOD data', and mentions using subsets or specific classes from datasets (e.g., 'To evaluate the model on STL-10, we only use the 9 classes which overlap with CIFAR-10 classes'), but it does not provide explicit numerical dataset splits (e.g., percentages, exact counts for train/test/validation) for reproducibility beyond using standard benchmark datasets. |
| Hardware Specification | No | The paper mentions 'NYU IT High Performance Computing resources, services, and staff expertise.' However, it does not specify any particular hardware components such as GPU or CPU models, or memory details used for the experiments. |
| Software Dependencies | No | The paper mentions adapting the 'Open OOD codebase (Zhang et al., 2023; Yang et al., 2022)' but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train models for 100 epochs with batch size 128 for ID data and batch size 256 for the outlier dataset, SGD with momentum and initial learning rate 0.1 and weight decay 5 10 4, and we set the coefficient before the OE loss to α = 0.5 (overall, we use standard training hyper-parameters as in Zhang et al. (2023)). |