reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

Authors: Md Yousuf Harun, Jhair Gallardo, Christopher Kanan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but degrades generalization, while weaker NC enhances generalization at the cost of detection. ... In experiments, our method excels at both tasks across OOD datasets and DNN architectures. ... Our key contributions are as follows: ... 3. In extensive experiments on diverse OOD datasets and DNN architectures, we demonstrate the efficacy of our method compared to baselines.
Researcher Affiliation	Academia	1Rochester Institute of Technology 2University of Rochester. Correspondence to: Md Yousuf Harun <EMAIL>.
Pseudocode	No	The paper describes methods in regular paragraph text and mathematical formulations. It does not contain an explicit figure, block, or section labeled 'Pseudocode', 'Algorithm', or structured steps formatted like code.
Open Source Code	Yes	1Code: https://yousuf907.github.io/ncoodg
Open Datasets	Yes	For ID dataset, we use Image Net-100 (Tian et al., 2020)a subset (100 classes) of Image Net-1K (Russakovsky et al., 2015). To assess OOD generalization and OOD detection, we study eight commonly used OOD datasets: NINCO (Bitterwolf et al., 2023), Image Net-R (Hendrycks et al., 2021), CIFAR-100 (Krizhevsky & Hinton, 2014), Oxford 102 Flowers (Nilsback & Zisserman, 2008), CUB200 (Wah et al., 2011), Aircrafts (Maji et al., 2013), Oxford IIIT Pets (Parkhi et al., 2012), and STL-10 (Coates et al., 2011).
Dataset Splits	Yes	For NINCO, the dataset has 5878 samples, and we split it into 4702 samples for training and 1176 samples for evaluation. ... CIFAR-100 is split into 50,000 training samples and 10,000 test samples. ... Aircrafts ... The training and test sets contain 6667 and 3333 images respectively. ... STL-10 has 10 classes with 500 training images and 800 test images per class.
Hardware Specification	Yes	When training DNNs on Image Net-100 (ID dataset) for 100 epochs using four NVIDIA RTX A5000 GPUs, both our method and the baseline require almost the same training time (see Table 23). ... For FLOPs analysis, we use Deep Speed 2 with the same GPU (single NVIDIA RTX A5000) across compared models.
Software Dependencies	No	The paper mentions Deep Speed 2 but does not provide specific version numbers for other key software components or libraries like Python, PyTorch, or TensorFlow. A single mention of a tool without other dependencies and versions is not sufficient for a reproducible software description.
Experiment Setup	Yes	In our main experiments, we train different DNN architectures e.g., VGG17, Res Net18, and Vi T-T on Image Net-100 for 100 epochs. The Entropy regularization loss Lreg is modulated with α = 0.05. We use Adam W (Loshchilov, 2017) optimizer and cosine learning rate scheduler with a linear warmup of 5 epochs. For a batch size of 512, we set the learning rate to 6 10 3 for VGG17, 0.01 for Res Net18, and 8 10 4 for Vi T-T. For all models, we set the weight decay to 0.05 and the label smoothing to 0.1. In all our experiments, we use 224 224 images. And, we use random resized crop and random horizontal flip augmentations. Linear probes are attached to the encoder and projector layers of a pre-trained model and trained on extracted embeddings of OOD data using the Adam W optimizer and CE loss for 30 epochs.