GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

Authors: Arto Maranjyan, Mher Safaryan, Peter Richtárik

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims. 5 Experiments To test the performance of Grad Skip and illustrate theoretical results, we use the classical logistic regression problem.
Researcher Affiliation Academia Artavazd Maranjyan EMAIL King Abdullah University of Science and Technology (KAUST) Mher Safaryan EMAIL King Abdullah University of Science and Technology (KAUST) Peter Richtárik EMAIL King Abdullah University of Science and Technology (KAUST)
Pseudocode Yes Algorithm 1 Grad Skip; Algorithm 2 Grad Skip+; Algorithm 3 VR-Grad Skip+
Open Source Code No The text does not explicitly provide a link to the source code for the methods described in this paper or state that the code is released.
Open Datasets Yes Experiments were conducted on artificially generated data and on the australian" dataset from Lib SVM library (Chang & Lin, 2011). We used the w6a" dataset from the Lib SVM library (Chang & Lin, 2011)
Dataset Splits No The paper mentions 'We split the dataset equally into n = 20 devices' for the 'australian' dataset, referring to data distribution among clients in a federated learning setup, but does not provide specific training/validation/test splits, percentages, or sample counts.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper refers to the LIBSVM library (Chang & Lin, 2011) as a tool used for experiments, but does not specify any software dependencies (e.g., programming languages, frameworks, or libraries) with version numbers for their own implementation.
Experiment Setup Yes We set the regularization parameter λ = 10^-4 Lmax. We run Grad Skip and Prox Skip algorithms for 3000 communication rounds. All algorithms were run using their theoretically optimal hyperparameters (stepsize, probabilities).