Almost Sure Convergence of Stochastic Gradient Methods under Gradient Domination

Authors: Simon Weissmann, Sara Klein, Waïss Azizian, Leif Döring

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove almost sure convergence rates f(Xn) f o n 1 4β 1 +ϵ of the last iterate for stochastic gradient descent (with and without momentum) under global and local β-gradient domination assumptions. The almost sure rates get arbitrarily close to recent rates in expectation. Finally, we demonstrate how to apply our results to the training task in both supervised and reinforcement learning. ... We summarize the contributions of this paper in Table 1. These findings are also illustrated in a numerical toy experiment in Appendix B, where we have implemented SGD and SHB for monomials with increasing degree.
Researcher Affiliation Academia Simon Weissmann EMAIL Institute of Mathematics University of Mannheim 68138 Mannheim, Germany. Sara Klein EMAIL Institute of Mathematics University of Mannheim 68138 Mannheim, Germany. Waïss Azizian EMAIL Univ. Grenoble Alpes, CNRS Inria, Grenoble INP, LJK 38000 Grenoble, France. Leif Döring EMAIL Institute of Mathematics University of Mannheim 68138 Mannheim, Germany.
Pseudocode No The paper provides mathematical definitions of algorithms (e.g., (SGD) and (SHB) iterative updates) but does not include structured pseudocode or algorithm blocks with numbered steps or explicit 'Algorithm' labels.
Open Source Code No The paper states in Appendix B: 'Both algorithms have been implemented by hand using MATLAB.' However, it does not provide any explicit statement about releasing the code, a repository link, or mention of code in supplementary materials for public access.
Open Datasets No The paper discusses applications in 'supervised learning' and 'reinforcement learning' and a 'numerical toy experiment' using a synthetic function `fp(x) = |x|^p`. It describes generic training data ('((Z(m), Y (m)))m N generated as i.i.d. samples from an unknown distribution µ(Z,Y )') and problem setups for MDPs, but does not provide concrete access information (links, DOIs, formal citations to specific public datasets) for any dataset used in experimental validation.
Dataset Splits No The paper's experimental validation is based on a 'numerical toy experiment' using a synthetic function `fp(x) = |x|^p` perturbed by noise, not on a dataset that would typically require training/test/validation splits. Therefore, no dataset split information is provided.
Hardware Specification No The paper mentions in Appendix B: 'Both algorithms have been implemented by hand using MATLAB.' However, no specific hardware details such as GPU models, CPU types, or memory specifications are provided for these implementations.
Software Dependencies No The paper states in Appendix B: 'Both algorithms have been implemented by hand using MATLAB.' While 'MATLAB' is mentioned, a specific version number is not provided, nor are any other software dependencies with version numbers.
Experiment Setup Yes Details of the implementation: Both algorithms have been implemented by hand using MATLAB. We have initialized both SGD and SHB with the initial state X1 1 2U([1.5, 2.5]) + 1 2U([ 2.5, 1.5]) to force initials which are not close to the actual minimum x = 0. The initial step sizes γ1(β) for both algorithms are chosen as γ1(0.5) = 0.2, γ1(0.67) = 0.13, γ1(0.83) = 0.004, γ1(0.92) = 10 6 through which we counteract the decreasing smoothness for β 1. The momentum parameter for SHB is fixed for all β as ν = 0.5. The exact gradients fp are perturbed by independent additive noise following a standard normal distribution N(0, 1). ... For each setting we have simulated 100 runs of length , N = 105.