Overparameterization of Deep ResNet: Zero Loss and Mean-field Analysis

Authors: Zhiyan Ding, Shi Chen, Qin Li, Stephen J. Wright

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, we show that under certain assumptions, the solution to the PDE converges in the training time to a zero-loss solution. Together, these results suggest that the training of the Res Net gives a near-zero loss if the Res Net is large enough. This article proposes a partial answer to this question for residual neural networks (Res Net), relying on three main toolboxes: a continuous limit argument, a mean-field limit argument, and gradient flow analysis. These toolboxes are used to translate gradient descent for parameter training into a partial differential equation (PDE), where PDE analysis (specifically, steady-state equilibrium analysis) is employed to trace the convergence to the minimizer. Contribution 1: We give a rigorous proof of the continuous and mean-field limit: Theorem 6. Contribution 2: We show that the global minimum can be achieved in the continuous setting under certain conditions: Theorem 7.
Researcher Affiliation Academia Zhiyan Ding EMAIL Mathematics Department University of Wisconsin-Madison Madison, WI 53706 USA. Shi Chen EMAIL Mathematics Department University of Wisconsin-Madison Madison, WI 53706 USA. Qin Li EMAIL Mathematics Department University of Wisconsin-Madison Madison, WI 53706 USA. Stephen J. Wright EMAIL Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 USA.
Pseudocode No The paper describes mathematical derivations, proofs, and theoretical analyses. It does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to code repositories for the methodology described.
Open Datasets No The paper discusses 'training data' and 'input data' generically, referring to 'x' and a probability distribution 'µ', but does not mention or provide access information for any specific public datasets.
Dataset Splits No The paper does not mention any specific datasets, and therefore no information about training/test/validation dataset splits is provided.
Hardware Specification No This paper is theoretical in nature, focusing on mathematical proofs and analysis. It does not describe any experiments that would require hardware, hence no hardware specifications are mentioned.
Software Dependencies No This paper is theoretical in nature and focuses on mathematical derivations. It does not describe any computational experiments or simulations that would require specific software dependencies with version numbers.
Experiment Setup No This paper focuses on theoretical analysis, proofs, and mathematical modeling of neural network behavior in limiting regimes. It does not describe any empirical experiments, and therefore no experimental setup details, hyperparameters, or training configurations are provided.