FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models
Authors: Kai Yi, Georg Meinhardt, Laurent Condat, Peter Richtárik
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, using the popular Top K compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings. ... We conducted detailed comparisons and ablation studies, validating the effectiveness of our approach. |
| Researcher Affiliation | Academia | Kai Yi EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) Georg Meinhardt EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) Laurent Condat EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI) Peter Richtárik EMAIL Department of Computer Science King Abdullah University of Science and Technology (KAUST) SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI) |
| Pseudocode | Yes | Algorithm 1 Fed Com Loc 1: stepsize γ > 0, probability p > 0, initial iterate x1,0 = = xn,0 Rd, initial control variates h1,0, . . . , hn,0 Rd on each client such that Pn i=1 hi,0 = 0, number of iterations T 1, compressor C( ) {Top K( ), Qr( ), } 2: server: flip a coin, θt {0, 1}, T times, where Prob(θt = 1) = p Decide when to skip communication 3: send the sequence θ0, . . . , θT 1 to all workers 4: for t = 0, 1, . . . , T 1 do 5: sample clients S {1, 2, 3, . . . , n} 6: in parallel on all workers i S do 7: Fed Com Loc-Local: local compression gi,t(xi,t) = gi,t(C(xi,t)) 8: ˆxi,t+1 = xi,t γ(gi,t(xi,t) hi,t) Local gradient-type step adjusted via the local control variate hi,t 9: Fed Com Loc-Com: uplink compression ˆxi,t+1 = C(ˆxi,t+1) 10: if θt = 1 then 11: xi,t+1 = 1 i=1 ˆxi,t+1 Average the iterates (with small probability p) 12: Fed Com Loc-Global: downlink compression xi,t+1 = C(xi,t+1) 14: xi,t+1 = ˆxi,t+1 Skip communication 15: end if 16: hi,t+1 = hi,t + p γ (xi,t+1 ˆxi,t+1) Update the local control variate hi,t 17: end local updates 18: end for |
| Open Source Code | No | Our intention is to make the code publicly available upon the acceptance of our work. |
| Open Datasets | Yes | Our experiments are conducted on Fed MNIST (Le Cun, 1998) and Fed CIFAR10 (Krizhevsky et al., 2009) with the data processing framework Fed Lab (Zeng et al., 2023). ... To further evaluate our method on more realistic workloads, we conducted experiments on two widely used FL benchmarks: FEMNIST (Caldas et al., 2018) and Shakespeare (Mc Mahan et al., 2017). |
| Dataset Splits | No | The paper describes how data is distributed across clients using a Dirichlet distribution and mentions total sample counts for datasets like MNIST (60,000 samples) and CIFAR10 (60,000 samples), and FEMNIST (671,585 samples), Shakespeare (16,068 samples). It indicates client participation rates (e.g., "100 clients from which 10 are uniformly chosen"). However, it does not provide explicit percentages, absolute sample counts, or specific methodology for splitting the overall datasets into training, validation, and test sets to enable full reproducibility of these splits in a conventional sense. |
| Hardware Specification | Yes | Our experimental setup involved the use of NVIDIA A100 or V100 GPUs, allocated based on their availability within our computing cluster. |
| Software Dependencies | Yes | We developed our framework using Py Torch version 1.4.0 and torchvision version 0.5.0, operating within a Python 3.8 environment. The Fed Lab framework (Zeng et al., 2023) was employed for the implementation of our code. |
| Experiment Setup | Yes | In the absence of specific clarifications, we adopt the Dirichlet factor α = 0.7. To balance both communication and local computation costs, we use p = 0.1, resulting in an average of 10 local iterations per communication round. The learning rate is chosen by conducting a grid search over the set {0.005, 0.01, 0.05, 0.1, 0.5}. The experiments are run for 2500 communication rounds for the CNN on Fed CIFAR10 and 500 rounds for the MLP on Fed MNIST. Furthermore, the dataset is distributed across 100 clients from which 10 are uniformly chosen to participate in each global round. |