FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models

Authors: Kai Yi, Georg Meinhardt, Laurent Condat, Peter Richtárik

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments, using the popular Top K compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings. ... We conducted detailed comparisons and ablation studies, validating the effectiveness of our approach.
Researcher Affiliation Academia Kai Yi EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) Georg Meinhardt EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) Laurent Condat EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI) Peter Richtárik EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI)
Pseudocode Yes Algorithm 1 Fed Com Loc 1: stepsize γ > 0, probability p > 0, initial iterate x1,0 = = xn,0 Rd, initial control variates h1,0, . . . , hn,0 Rd on each client such that Pn i=1 hi,0 = 0, number of iterations T 1, compressor C( ) {Top K( ), Qr( ), } 2: server: flip a coin, θt {0, 1}, T times, where Prob(θt = 1) = p Decide when to skip communication 3: send the sequence θ0, . . . , θT 1 to all workers 4: for t = 0, 1, . . . , T 1 do 5: sample clients S {1, 2, 3, . . . , n} 6: in parallel on all workers i S do 7: Fed Com Loc-Local: local compression gi,t(xi,t) = gi,t(C(xi,t)) 8: ˆxi,t+1 = xi,t γ(gi,t(xi,t) hi,t) Local gradient-type step adjusted via the local control variate hi,t 9: Fed Com Loc-Com: uplink compression ˆxi,t+1 = C(ˆxi,t+1) 10: if θt = 1 then 11: xi,t+1 = 1 i=1 ˆxi,t+1 Average the iterates (with small probability p) 12: Fed Com Loc-Global: downlink compression xi,t+1 = C(xi,t+1) 14: xi,t+1 = ˆxi,t+1 Skip communication 15: end if 16: hi,t+1 = hi,t + p γ (xi,t+1 ˆxi,t+1) Update the local control variate hi,t 17: end local updates 18: end for
Open Source Code No Our intention is to make the code publicly available upon the acceptance of our work.
Open Datasets Yes Our experiments are conducted on Fed MNIST (Le Cun, 1998) and Fed CIFAR10 (Krizhevsky et al., 2009) with the data processing framework Fed Lab (Zeng et al., 2023). ... To further evaluate our method on more realistic workloads, we conducted experiments on two widely used FL benchmarks: FEMNIST (Caldas et al., 2018) and Shakespeare (Mc Mahan et al., 2017).
Dataset Splits No The paper describes how data is distributed across clients using a Dirichlet distribution and mentions total sample counts for datasets like MNIST (60,000 samples) and CIFAR10 (60,000 samples), and FEMNIST (671,585 samples), Shakespeare (16,068 samples). It indicates client participation rates (e.g., "100 clients from which 10 are uniformly chosen"). However, it does not provide explicit percentages, absolute sample counts, or specific methodology for splitting the overall datasets into training, validation, and test sets to enable full reproducibility of these splits in a conventional sense.
Hardware Specification Yes Our experimental setup involved the use of NVIDIA A100 or V100 GPUs, allocated based on their availability within our computing cluster.
Software Dependencies Yes We developed our framework using Py Torch version 1.4.0 and torchvision version 0.5.0, operating within a Python 3.8 environment. The Fed Lab framework (Zeng et al., 2023) was employed for the implementation of our code.
Experiment Setup Yes In the absence of specific clarifications, we adopt the Dirichlet factor α = 0.7. To balance both communication and local computation costs, we use p = 0.1, resulting in an average of 10 local iterations per communication round. The learning rate is chosen by conducting a grid search over the set {0.005, 0.01, 0.05, 0.1, 0.5}. The experiments are run for 2500 communication rounds for the CNN on Fed CIFAR10 and 500 rounds for the MLP on Fed MNIST. Furthermore, the dataset is distributed across 100 clients from which 10 are uniformly chosen to participate in each global round.