reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Authors: Arthur Jacot, Peter Súkeník, Zihan Wang, Marco Mondelli

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments on various architectures (fully connected, Res Net) and datasets (MNIST, CIFAR) confirm the insights coming from the theory: (i) NC2 is more prominent as the depth of the linear head increases, and (ii) the final linear layers are balanced at convergence. Furthermore, we show that, as the non-linear part of the network gets deeper, the non-negative layers become less non-linear and more balanced.
Researcher Affiliation	Academia	Courant Institute of Mathematical Sciences, NYU. Email: EMAIL Institute of Science and Technology Austria. Email: EMAIL Courant Institute of Mathematical Sciences, NYU. Email: EMAIL Institute of Science and Technology Austria. Email: EMAIL
Pseudocode	No	The paper describes methods mathematically and textually but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets	Yes	In all experiments, we consider MSE loss and standard weight decay regularization. We train an MLP and a Res Net20 with an added MLP head on standard datasets (MNIST, CIFAR10), considering as backbone the first two layers for the MLP and the whole architecture before the linear head for the Res Net.
Dataset Splits	No	The paper mentions using "standard datasets (MNIST, CIFAR10)" but does not specify the exact training, validation, or test splits used for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	Yes	We use weight decay of 0.001 and learning rate of 0.001, training for 5000 epochs (the learning rate drops ten-fold after 80% of the epochs in all our experiments). ... We average over 5 runs per each combination of weight decay (0.001, 0.004) and with learning rate of 0.001.