Neural Collapse: A Review on Modelling Principles and Generalization
Authors: Vignesh Kothapalli
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we review the principles which aid in modelling neural collapse, followed by the implications of this state on generalization and transfer learning capabilities of neural networks. Finally, we conclude by discussing potential avenues and directions for future research. ... We review and analyse NC modelling techniques based on the principles of unconstrained features and local elasticity, which is currently missing in the literature. We review and analyse the implications of NC on the generalization and transfer learning capabilities of deep classifier neural networks. |
| Researcher Affiliation | Academia | Vignesh Kothapalli EMAIL Courant Institute of Mathematical Sciences New York University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. It describes methodologies and equations in prose and mathematical notation. |
| Open Source Code | No | The paper is a review and does not provide open-source code for its own methodology. Section A.5 "Code availability" lists open-source implementations for models discussed in the review, which belong to other research papers (e.g., "Neural Collapse (Papyan et al. (2020)) pytorch neuralcollapse"), not code for the current paper's contributions. |
| Open Datasets | Yes | Figure 1: Evolution of penultimate layer outputs of a VGG13 neural network when trained on the CIFAR10 dataset with 3 randomly selected classes. ... Ji et al. (2021) using Res Net18 and VGG13 networks to classify CIFAR10, MNIST, KMNIST and Fashion MNIST data sets. ... experiments by Papyan et al. (2020) (see figure 5) show varying magnitude/extent of variance collapse depending on the complexity of data. For smaller data sets such as CIFAR10, a Res Net18 network attains a NC1 value of 10 2, while for Image Net, a Res Net152 network attains a NC1 value of 1. |
| Dataset Splits | No | The paper is a review and refers to experiments conducted by other papers. While these other papers may have specified dataset splits, this review paper does not explicitly provide specific training/test/validation dataset splits for its own experimental work. |
| Hardware Specification | No | The paper is a review and does not report on original experiments conducted by its author. Therefore, it does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper is a review and does not report on original experiments conducted by its author. While it mentions 'pytorch' in the context of other papers' code availability (Table 3), it does not specify any software dependencies with version numbers for its own work. |
| Experiment Setup | Yes | Figure 4: NC metrics of MLP and Res Net18 on a randomly labelled CIFAR10 dataset using cross-entropy loss. The width of the network is maintained across layers and varied across experiments. The first row corresponds to a 4-layer MLP, optimized using SGD with a learning rate 0.01 and weight decay 10 4. The second row corresponds to Res Net18, optimized using SGD with momentum 0.9, weight decay 5 10 4, initial learning rate 0.05, decreased by a factor of 10 every 40 epochs. ... Figure 7: Train vs test performance of Res Net18 on MNIST (first two plots) and CIFAR10 (last two plots) with SGD, Adam and LBFGS optimizers. SGD with momentum 0.9 and Adam with β1 = 0.9, β2 = 0.999 were initialized with learning rate 0.05, 0.001 respectively and scheduled to decrease by a factor of 10 every 40 epochs. LBFGS was initialized with memory size 10, learning rate 0.1 and employed a Wolfe line-search strategy for following iterations. Weight decay is commonly set to 5 10 4. |