Can Kernel Methods Explain How the Data Affects Neural Collapse?

Authors: Vignesh Kothapalli, Tom Tirer

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical results on Gaussian data to show that the adaptivity of such kernels yields lower NC1 and allows them to reasonably approximate the NC1 behavior of shallow NNs for linearly separable datasets. We conduct experiments on datasets with varying sample sizes and input dimensions to verify our theoretical results and show that insights generalize (e.g., beyond d0 = 1).
Researcher Affiliation Academia Vignesh Kothapalli EMAIL Courant Institute of Mathematical Sciences New York University Tom Tirer EMAIL Faculty of Engineering Bar-Ilan University
Pseudocode No The paper describes methods and equations, such as the Equations of State for the data-aware GP Kernel in Definition 6.1 and its numerical solutions in Section 6.2, but it does not present these as structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at: https://github.com/kvignesh1420/shallow_nc1.
Open Datasets Yes For C = 2, a dataset size N chosen from {128, 256, 512, 1024}, and input dimension d0 chosen from {1, 2, 8, 32, 128}, we create the data vector and label pairs as follows: D1(N, d0) = (x1,i N( 2 1d0, 0.25 Id0), y1,i = 1), i [N/2]) (x2,j N(2 1d0, 0.25 Id0), y1,i = 1), j [N/2]) .
Dataset Splits No The paper describes the generation of synthetic datasets D1(N, d0) and D2(N, d0) with parameters for class distributions and sample sizes. For example, 'D1(N, d0) = (x1,i N( 2 1d0, 0.25 Id0), y1,i = 1), i [N/2]) (x2,j N(2 1d0, 0.25 Id0), y1,i = 1), j [N/2])'. However, it does not explicitly provide details about how these datasets are split into training, validation, or test sets for the experiments involving the 2L-FCN.
Hardware Specification Yes All the experiments in this paper were executed on a machine with 16 GB of host memory and 8 CPU cores.
Software Dependencies No The paper mentions using 'scipy.optimize.newton_krylov python API' for solving the Eo S, but it does not provide specific version numbers for Python, SciPy, or any other software libraries used in the experiments.
Experiment Setup Yes Setup: We train a 2L-FCN with d1 = 500, σw = 1, σb = 0 and Erf activation using (vanilla) Gradient Descent with a learning rate of 10 3 and weight-decay 10 6 for 1000 steps to reach the terminal phase of training... The Re LU activation experiments use a learning rate of 10 4.