reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization and Stability of Interpolating Neural Networks with Minimal Width

Authors: Hossein Taheri, Christos Thrampoulidis

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We investigate the generalization and optimization properties of shallow neural-network classiﬁers trained by gradient descent in the interpolating regime. Speciﬁcally, in a realizable scenario where model weights can achieve arbitrarily small training error ϵ and their distance from initialization is g(ϵ), we demonstrate that gradient descent with n training data achieves training error O(g(1/T)2/T) and generalization error O(g(1/T)2/n) at iteration T... Our results diﬀer from existing generalization outcomes using the algorithmic-stability framework... Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property... Eventually, despite the objective s non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.
Researcher Affiliation	Academia	Hossein Taheri EMAIL Department of Electrical and Computer Engineering University of California Santa Barbara, CA, USA Christos Thrampoulidis EMAIL Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada
Pseudocode	No	The paper describes the gradient descent update rule as: "t 0 : wt+1 = wt η b F(wt)." This is presented as an equation within the text, rather than a formally labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper includes an example using a "noisy XOR data distribution" and describes how it is constructed: "Consider the following 2d xi = (x1 i , x2 i , , xd i ) {(1, 0), (0, 1), ( 1, 0), (0, 1)} { 1, 1}d 2... The noisy XOR data distribution is the uniform distribution over the set with elements (xi, yi)." While the data generation is described, this is a theoretical construction for analysis rather than a publicly accessible empirical dataset with concrete access information (link, DOI, etc.).
Dataset Splits	No	The paper is theoretical and focuses on mathematical analysis and proofs. It does not perform empirical experiments requiring training, testing, or validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would require specific hardware specifications.
Software Dependencies	No	The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on mathematical analysis and proofs. It does not describe any empirical experiments that would involve hyperparameters or system-level training settings.