Scalable Decentralized Learning with Teleportation

Authors: Yuki Takezawa, Sebastian Stich

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we showed that TELEPORTATION can train neural networks more stably and achieve higher accuracy than Decentralized SGD. 5 EXPERIMENT 5.1 SYNTHETIC EXPERIMENT 5.2 NEURAL NETWORKS 5.3 COMPARISON UNDER HETEROGENEOUS NETWORKS We depict the results in Fig. 2. For all cases, TELEPORTATION required fewer iterations to reach the target accuracy than Decentralized SGD.
Researcher Affiliation Academia Yuki Takezawa Kyoto University, OIST Sebastian U. Stich CISPA Helmholtz Center for Information Security
Pseudocode Yes Algorithm 1 Simple version of TELEPORTATION Algorithm 2 Efficient hyperparameter search for TELEPORTATION.
Open Source Code No No explicit statement or link for open-source code for the methodology described in this paper is provided.
Open Datasets Yes We used Fashion MNIST (Xiao et al., 2017) and CIFAR-10 (Krizhevsky, 2009) as datasets
Dataset Splits No We used Fashion MNIST (Xiao et al., 2017) and CIFAR-10 (Krizhevsky, 2009) as datasets and distributed the data to nodes using Dirichlet distribution with parameter α (Hsu et al., 2019). No explicit mention of training/test/validation dataset splits (percentages or counts) or citations to specific predefined splits are provided in the main text.
Hardware Specification Yes Computational resources AMD Epyc 7702 CPU or Intel Xeon Gold 6230 CPU (Table 3), Computational resources Titan 8 (Table 4), Computational resources A6000 8 or RTX 3090 8 (Table 5).
Software Dependencies No No specific software dependencies with version numbers (e.g., library or framework versions) are explicitly mentioned in the paper.
Experiment Setup Yes Step size Grid search over {0.1, 0.075, 0.05, 0.025, 0.01, , 0.0001} (Table 3), Model Le Net Step size Grid search over {0.1, 0.01, 0.001} Batch size 32 Momentum 0.9 Epoch 200 (Table 4), Model VGG Step size Grid search over {0.1, 0.01, 0.001} Scheduler Cosine decay Batch size 32 Momentum 0.9 Epoch 500 (Table 5).