Enhancing Parallelism in Decentralized Stochastic Convex Optimization

Authors: Ofri Eisen, Ron Dorfman, Kfir Yehuda Levy

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate our method on both a synthetic least squares problem and an image classification task. All experiments are conducted using three random seeds, and we report the averaged results. [...] In Figure 1, we plot the final error as a function of the number of machines, across four configurations defined by σ, ζ {1, 10}, with different colors indicating the underlying topology. [...] Figure 2 depicts the test accuracy for M = 8 and 16 machines under heterogeneous (α = 0.1) and nearly homogeneous (α = 10) data, with different colors indicating the compared methods.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Technion, Haifa, Israel. Correspondence to: Ofri Eisen <EMAIL>.
Pseudocode Yes Algorithm 1 Decentralized SGD (D-SGD) [...] Algorithm 2 Decentralized Anytime SGD (DAT-SGD)
Open Source Code No The paper does not provide an explicit statement about releasing its source code nor a link to a code repository.
Open Datasets Yes Next, we evaluate our method on the Fashion MNIST (Xiao et al., 2017) image classification task using the Le Net (Le Cun et al., 1998) architecture.
Dataset Splits No The paper states: "The data is partitioned among workers following a Dirichlet distribution with parameter α, which controls the heterogeneity level (Hsu et al., 2019)." This describes data distribution across machines but does not provide specific train/test/validation splits for the Fashion MNIST dataset, nor does it explicitly cite standard splits for it.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not specify version numbers for any software dependencies used in the experiments.
Experiment Setup Yes For each method and topology, we perform a grid search over the learning rate η {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1} and select the value that yields the lowest error after 100K iterations. For DAT-SGD, we use constant weights αt = 1 for all t. [...] For our method and D-SGD, we use momentum with parameter β = 0.9. For each method and topology, the learning rate was selected via grid search over η {0.001, 0.01, 0.1}.