Generalization and Stability of Interpolating Neural Networks with Minimal Width

Authors: Hossein Taheri, Christos Thrampoulidis

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error ϵ and their distance from initialization is g(ϵ), we demonstrate that gradient descent with n training data achieves training error O(g(1/T)2/T) and generalization error O(g(1/T)2/n) at iteration T... Our results differ from existing generalization outcomes using the algorithmic-stability framework... Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property... Eventually, despite the objective s non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.
Researcher Affiliation Academia Hossein Taheri EMAIL Department of Electrical and Computer Engineering University of California Santa Barbara, CA, USA Christos Thrampoulidis EMAIL Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada
Pseudocode No The paper describes the gradient descent update rule as: "t 0 : wt+1 = wt η b F(wt)." This is presented as an equation within the text, rather than a formally labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper includes an example using a "noisy XOR data distribution" and describes how it is constructed: "Consider the following 2d xi = (x1 i , x2 i , , xd i ) {(1, 0), (0, 1), ( 1, 0), (0, 1)} { 1, 1}d 2... The noisy XOR data distribution is the uniform distribution over the set with elements (xi, yi)." While the data generation is described, this is a theoretical construction for analysis rather than a publicly accessible empirical dataset with concrete access information (link, DOI, etc.).
Dataset Splits No The paper is theoretical and focuses on mathematical analysis and proofs. It does not perform empirical experiments requiring training, testing, or validation dataset splits.
Hardware Specification No The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would require specific hardware specifications.
Software Dependencies No The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would involve hyperparameters or system-level training settings.