Enhancing Parallelism in Decentralized Stochastic Convex Optimization
Authors: Ofri Eisen, Ron Dorfman, Kfir Yehuda Levy
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate our method on both a synthetic least squares problem and an image classification task. All experiments are conducted using three random seeds, and we report the averaged results. [...] In Figure 1, we plot the final error as a function of the number of machines, across four configurations defined by σ, ζ {1, 10}, with different colors indicating the underlying topology. [...] Figure 2 depicts the test accuracy for M = 8 and 16 machines under heterogeneous (α = 0.1) and nearly homogeneous (α = 10) data, with different colors indicating the compared methods. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Technion, Haifa, Israel. Correspondence to: Ofri Eisen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Decentralized SGD (D-SGD) [...] Algorithm 2 Decentralized Anytime SGD (DAT-SGD) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code nor a link to a code repository. |
| Open Datasets | Yes | Next, we evaluate our method on the Fashion MNIST (Xiao et al., 2017) image classification task using the Le Net (Le Cun et al., 1998) architecture. |
| Dataset Splits | No | The paper states: "The data is partitioned among workers following a Dirichlet distribution with parameter α, which controls the heterogeneity level (Hsu et al., 2019)." This describes data distribution across machines but does not provide specific train/test/validation splits for the Fashion MNIST dataset, nor does it explicitly cite standard splits for it. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | For each method and topology, we perform a grid search over the learning rate η {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1} and select the value that yields the lowest error after 100K iterations. For DAT-SGD, we use constant weights αt = 1 for all t. [...] For our method and D-SGD, we use momentum with parameter β = 0.9. For each method and topology, the learning rate was selected via grid search over η {0.001, 0.01, 0.1}. |