Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Authors: Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments over representative OOD detection setups, achieving the SOTA performance on CIFAR10, CIFAR100 and Image Net benchmarks without any additional data augmentation or sampling, demonstrating the importance of feature separation in OOD detection.
Researcher Affiliation Academia Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University EMAIL
Pseudocode No The paper describes methods using mathematical equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Wuyingwen/Pursuing Feature-Separation-for-OOD-Detection.
Open Datasets Yes For CIFAR benchmarks, we randomly choose 300K samples from the 80 Million Tiny Images (Torralba et al., 2008) as our auxiliary OOD dataset. And we adopt five routinely used datasets as the test OOD datasets, including SVHN (Netzer et al., 2011), LSUN (Yu et al., 2015), i SUN (Xu et al., 2015), Texture (Cimpoi et al., 2014) and Places365 (Zhou et al., 2017), which have non-overlapping categories w.r.t. CIFAR datasets. For Image Net benchmark, we use a validation subset of Image Net-21k-p dataset as auxiliary OOD dataset. And we adopt four commonly-used OOD datasets for evaluation, including i Naturalist (Van Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2017) and Textures (Cimpoi et al., 2014).
Dataset Splits Yes For CIFAR benchmarks, we randomly choose 300K samples from the 80 Million Tiny Images (Torralba et al., 2008) as our auxiliary OOD dataset. And we adopt five routinely used datasets as the test OOD datasets... For Image Net benchmark, we use a validation subset of Image Net-21k-p dataset as auxiliary OOD dataset. And we adopt four commonly-used OOD datasets for evaluation... The threshold is usually set based on ID data to ensure that a high fraction of ID data (e.g., 95%) is correctly identified as ID samples. Fine-tuning Setups: For both CIFAR10 and CIFAR100 benchmarks... with ID batch size 128, OOD batch size 256.
Hardware Specification No The paper specifies model architectures used (e.g., Wide Res Net-40-2, Res Net50) but does not provide specific details about the hardware components (like GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No For Image Net benchmarks, we directly use the pre-trained Res Net50 (He et al., 2016) model in Pytorch as the baseline network. While PyTorch is mentioned, no specific version number is provided for any software dependency.
Experiment Setup Yes For CIFAR benchmarks, we employ Wide Res Net-40-2 (Zagoruyko & Komodakis, 2016) trained for 200 epochs, with batch size 128, init learning rate 0.1, momentum 0.9, weight decay 0.0005, and cosine schedule. For Image Net benchmarks, we directly use the pre-trained Res Net50 (He et al., 2016) model in Pytorch as the baseline network. Fine-tuning Setups: For both CIFAR10 and CIFAR100 benchmarks... train the model for 50 epochs with ID batch size 128, OOD batch size 256, initial learning rate 0.07, momentum 0.9, weight decay 0.0005 and cosine schedule. For Image Net benchmark... fine-tune the model for 5 epochs with ID/OOD batch size 64, initial learning rate 1e 4, momentum 0.9, weight decay 0.0005 and cosine schedule. In our experiments, we use the common setting λ = 0.5 in previous works (Hendrycks et al., 2018) and set α = 1.0 and β = 1.0 for simplicity.