Convergence of Distributed Adaptive Optimization with Local Updates

Authors: Ziheng Cheng, Margalit Glasgow

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical for the first time, we prove that Local SGD with momentum (Local SGDM) and Local Adam can outperform their minibatch counterparts in convex and weakly convex settings in certain regimes, respectively. Our analysis relies on a novel technique to prove contraction during local iterations, which is a crucial yet challenging step to show the advantages of local updates, under generalized smoothness assumption and gradient clipping strategy.
Researcher Affiliation Academia Ziheng Cheng University of California, Berkeley ziheng EMAIL Margalit Glasgow Massachusetts Institute of Technology EMAIL
Pseudocode Yes Local Adam is shown in Algorithm 1, which is a natural extension of centralized Adam (Kingma & Ba, 2014).
Open Source Code No The paper does not contain any explicit statements about the release of source code, nor does it provide any links to code repositories.
Open Datasets No The paper is theoretical and discusses general 'data distribution D' and 'stochastic gradient oracle F(x; ΞΎ)' but does not mention or provide access information for any specific, publicly available datasets.
Dataset Splits No As the paper is theoretical and does not describe experiments on specific datasets, there is no mention of training/test/validation dataset splits.
Hardware Specification No The paper focuses on theoretical convergence analysis and does not describe any experimental hardware used.
Software Dependencies No The paper does not specify any software dependencies or version numbers, as it is primarily a theoretical work.
Experiment Setup No The paper is theoretical, providing convergence proofs and analysis. It does not include an experimental setup section or details on hyperparameters or training configurations.