Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Authors: Dennis Chemnitz, Maximilian Engel
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While the present work is purely theoretical, the Lyapunov exponent λ(x ) of a global minimum x is also an interesting object for empirical studies. |
| Researcher Affiliation | Academia | Dennis Chemnitz EMAIL Fachbereich Mathematik und Informatik Freie Universit at Berlin 14195 Berlin, Germany Maximilian Engel EMAIL Kd V Institute for Mathematics University of Amsterdam 1090 GE Amsterdam, Netherlands and Fachbereich Mathematik und Informatik Freie Universit at Berlin 14195 Berlin, Germany |
| Pseudocode | No | The paper contains detailed mathematical derivations, theorems, and proofs (e.g., Section 3.1: Overview, Section 3.3: Gradient Descent the Stable Case, Section 3.5: Random Dynamical System Framework for SGD). It does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | While the present work is purely theoretical... We leave it as an open problem to the community to determine the behavior of λ by the means of empirical experiments. An interesting challenge in conducting such a study lies in the numerical computation of the Lyapunov exponent for high-dimensional random-matrix products. Numerical schemes for the computation of Lyapunov exponents can be found, for example, in Eckmann and Ruelle (1985) and Sandri (1996). |
| Open Datasets | No | In the following, we consider a scalar regression problem. Let ˆf : Rd R be a ground truth function, which is supposed to be reconstructed from N given data pairs (yi, zi = ˆf(yi))i [N], where [N] = {1, . . . , N}. To do so, we consider a parameterized network model given by a smooth function2 F : RD Rd R and try, using the given data, to find an x RD such that F(x, ) ˆf. |
| Dataset Splits | No | The paper is purely theoretical and focuses on mathematical derivations and proofs. It does not describe any empirical experiments or the use of specific datasets, thus no dataset splits are provided. |
| Hardware Specification | No | The paper is purely theoretical, focusing on mathematical frameworks and proofs. No experiments are described, and consequently, no hardware specifications are provided. |
| Software Dependencies | No | The paper is purely theoretical and focuses on mathematical concepts and proofs. It does not describe any software implementation or experimental setup, therefore no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is purely theoretical and presents a mathematical framework for analyzing the stability of gradient descent algorithms. It does not include any empirical experiments or details about an experimental setup, such as specific hyperparameter values or training configurations. |