Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Authors: Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose new limiting dynamics for stochastic gradient descent in the small learning rate regime called stochastic modified flows. These SDEs are driven by a cylindrical Brownian motion and improve the so-called stochastic modified equations by having regular diffusion coefficients and by matching the multi-point statistics. As a second contribution, we introduce distribution dependent stochastic modified flows which we prove to describe the fluctuating limiting dynamics of stochastic gradient descent in the small learning rate infinite width scaling regime. Keywords: stochastic gradient descent, machine learning, overparametrization, stochastic modified equation, fluctuation mean field limit |
| Researcher Affiliation | Academia | Benjamin Gess EMAIL Fakult at f ur Mathematik Universit at Bielefeld 33615 Bielefeld, Germany and Max Planck Institute for Mathematics in the Sciences 04103 Leipzig, Germany Sebastian Kassing EMAIL Fakult at f ur Mathematik Universit at Bielefeld 33615 Bielefeld, Germany Vitalii Konarovskyi EMAIL Fakult at f ur Mathematik, Informatik und Naturwissenschaften Universit at Hamburg 20146 Hamburg, Germany and Institute of Mathematics of NAS of Ukraine 01024 Kyiv, Ukraine |
| Pseudocode | No | The paper presents mathematical theorems, definitions, and derivations related to stochastic differential equations and stochastic gradient descent dynamics, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide links to a code repository. |
| Open Datasets | No | The paper discusses 'training data set Ξ Rn0 sampled from a probability distribution ϑ' and 'a given training data set with inputs Ξ = {ξ : (ξ, f(ξ)) D} and labels {f(ξ) : (ξ, f(ξ)) D}'. These references are to generic conceptual data, not specific, named, or publicly available datasets used for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments using specific datasets, therefore, no information on training/test/validation dataset splits is provided. |
| Hardware Specification | No | This paper is purely theoretical, focusing on mathematical modeling and proofs. It does not describe any computational experiments or the hardware used to perform them. |
| Software Dependencies | No | The paper is theoretical and does not describe any computational experiments, thus it does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is a theoretical work focusing on mathematical dynamics and proofs. It does not describe any experimental setups, hyperparameters, or training configurations. |