reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Overparameterization of Deep ResNet: Zero Loss and Mean-field Analysis

Authors: Zhiyan Ding, Shi Chen, Qin Li, Stephen J. Wright

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	First, we use a mean-ﬁeld-limit argument to prove that the gradient descent for parameter training becomes a gradient ﬂow for a probability distribution that is characterized by a partial diﬀerential equation (PDE) in the large-NN limit. Next, we show that under certain assumptions, the solution to the PDE converges in the training time to a zero-loss solution. Together, these results suggest that the training of the Res Net gives a near-zero loss if the Res Net is large enough. This article proposes a partial answer to this question for residual neural networks (Res Net), relying on three main toolboxes: a continuous limit argument, a mean-ﬁeld limit argument, and gradient ﬂow analysis. These toolboxes are used to translate gradient descent for parameter training into a partial diﬀerential equation (PDE), where PDE analysis (speciﬁcally, steady-state equilibrium analysis) is employed to trace the convergence to the minimizer. Contribution 1: We give a rigorous proof of the continuous and mean-ﬁeld limit: Theorem 6. Contribution 2: We show that the global minimum can be achieved in the continuous setting under certain conditions: Theorem 7.
Researcher Affiliation	Academia	Zhiyan Ding EMAIL Mathematics Department University of Wisconsin-Madison Madison, WI 53706 USA. Shi Chen EMAIL Mathematics Department University of Wisconsin-Madison Madison, WI 53706 USA. Qin Li EMAIL Mathematics Department University of Wisconsin-Madison Madison, WI 53706 USA. Stephen J. Wright EMAIL Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 USA.
Pseudocode	No	The paper describes mathematical derivations, proofs, and theoretical analyses. It does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to code repositories for the methodology described.
Open Datasets	No	The paper discusses 'training data' and 'input data' generically, referring to 'x' and a probability distribution 'µ', but does not mention or provide access information for any specific public datasets.
Dataset Splits	No	The paper does not mention any specific datasets, and therefore no information about training/test/validation dataset splits is provided.
Hardware Specification	No	This paper is theoretical in nature, focusing on mathematical proofs and analysis. It does not describe any experiments that would require hardware, hence no hardware specifications are mentioned.
Software Dependencies	No	This paper is theoretical in nature and focuses on mathematical derivations. It does not describe any computational experiments or simulations that would require specific software dependencies with version numbers.
Experiment Setup	No	This paper focuses on theoretical analysis, proofs, and mathematical modeling of neural network behavior in limiting regimes. It does not describe any empirical experiments, and therefore no experimental setup details, hyperparameters, or training configurations are provided.