reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Authors: Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments over representative OOD detection setups, achieving the SOTA performance on CIFAR10, CIFAR100 and Image Net benchmarks without any additional data augmentation or sampling, demonstrating the importance of feature separation in OOD detection.
Researcher Affiliation	Academia	Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Wuyingwen/Pursuing Feature-Separation-for-OOD-Detection.
Open Datasets	Yes	For CIFAR benchmarks, we randomly choose 300K samples from the 80 Million Tiny Images (Torralba et al., 2008) as our auxiliary OOD dataset. And we adopt five routinely used datasets as the test OOD datasets, including SVHN (Netzer et al., 2011), LSUN (Yu et al., 2015), i SUN (Xu et al., 2015), Texture (Cimpoi et al., 2014) and Places365 (Zhou et al., 2017), which have non-overlapping categories w.r.t. CIFAR datasets. For Image Net benchmark, we use a validation subset of Image Net-21k-p dataset as auxiliary OOD dataset. And we adopt four commonly-used OOD datasets for evaluation, including i Naturalist (Van Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2017) and Textures (Cimpoi et al., 2014).
Dataset Splits	Yes	For CIFAR benchmarks, we randomly choose 300K samples from the 80 Million Tiny Images (Torralba et al., 2008) as our auxiliary OOD dataset. And we adopt five routinely used datasets as the test OOD datasets... For Image Net benchmark, we use a validation subset of Image Net-21k-p dataset as auxiliary OOD dataset. And we adopt four commonly-used OOD datasets for evaluation... The threshold is usually set based on ID data to ensure that a high fraction of ID data (e.g., 95%) is correctly identified as ID samples. Fine-tuning Setups: For both CIFAR10 and CIFAR100 benchmarks... with ID batch size 128, OOD batch size 256.
Hardware Specification	No	The paper specifies model architectures used (e.g., Wide Res Net-40-2, Res Net50) but does not provide specific details about the hardware components (like GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	For Image Net benchmarks, we directly use the pre-trained Res Net50 (He et al., 2016) model in Pytorch as the baseline network. While PyTorch is mentioned, no specific version number is provided for any software dependency.
Experiment Setup	Yes	For CIFAR benchmarks, we employ Wide Res Net-40-2 (Zagoruyko & Komodakis, 2016) trained for 200 epochs, with batch size 128, init learning rate 0.1, momentum 0.9, weight decay 0.0005, and cosine schedule. For Image Net benchmarks, we directly use the pre-trained Res Net50 (He et al., 2016) model in Pytorch as the baseline network. Fine-tuning Setups: For both CIFAR10 and CIFAR100 benchmarks... train the model for 50 epochs with ID batch size 128, OOD batch size 256, initial learning rate 0.07, momentum 0.9, weight decay 0.0005 and cosine schedule. For Image Net benchmark... fine-tune the model for 5 epochs with ID/OOD batch size 64, initial learning rate 1e 4, momentum 0.9, weight decay 0.0005 and cosine schedule. In our experiments, we use the common setting λ = 0.5 in previous works (Hendrycks et al., 2018) and set α = 1.0 and β = 1.0 for simplicity.