Generalization and Stability of Interpolating Neural Networks with Minimal Width
Authors: Hossein Taheri, Christos Thrampoulidis
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error ϵ and their distance from initialization is g(ϵ), we demonstrate that gradient descent with n training data achieves training error O(g(1/T)2/T) and generalization error O(g(1/T)2/n) at iteration T... Our results differ from existing generalization outcomes using the algorithmic-stability framework... Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property... Eventually, despite the objective s non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression. |
| Researcher Affiliation | Academia | Hossein Taheri EMAIL Department of Electrical and Computer Engineering University of California Santa Barbara, CA, USA Christos Thrampoulidis EMAIL Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada |
| Pseudocode | No | The paper describes the gradient descent update rule as: "t 0 : wt+1 = wt η b F(wt)." This is presented as an equation within the text, rather than a formally labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper includes an example using a "noisy XOR data distribution" and describes how it is constructed: "Consider the following 2d xi = (x1 i , x2 i , , xd i ) {(1, 0), (0, 1), ( 1, 0), (0, 1)} { 1, 1}d 2... The noisy XOR data distribution is the uniform distribution over the set with elements (xi, yi)." While the data generation is described, this is a theoretical construction for analysis rather than a publicly accessible empirical dataset with concrete access information (link, DOI, etc.). |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical analysis and proofs. It does not perform empirical experiments requiring training, testing, or validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would require specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would involve hyperparameters or system-level training settings. |