Differentiated Aggregation to Improve Generalization in Federated Learning
Authors: Peyman Gholami, Hulya Seferoglu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results confirm that Fed ALS outperforms the baselines in terms of accuracy in non-iid setups while also saving on communication costs across all setups. Our codes are available for reproducibility. In this section, we assess the performance of Fed ALS using deep neural network model Res Net-20 for CIFAR-10, CIFAR-100 (Krizhevsky, 2009), SVHN (Netzer et al., 2011), and MNIST (Lecun et al., 1998) datasets. |
| Researcher Affiliation | Academia | Peyman Gholami EMAIL Department of Electrical and Computer Engineering University of Illinois Chicago; Hulya Seferoglu EMAIL Department of Electrical and Computer Engineering University of Illinois Chicago |
| Pseudocode | Yes | Algorithm 1 Fed ALS; Algorithm 2 Fed Avg; Algorithm 3 SCAFFOLD; Algorithm 4 Fed ALS + SCAFFOLD |
| Open Source Code | Yes | Our codes are available for reproducibility. |
| Open Datasets | Yes | We evaluate the performance of Fed ALS using deep neural network model Res Net-20 for CIFAR-10, CIFAR-100 (Krizhevsky, 2009), SVHN (Netzer et al., 2011), and MNIST (Lecun et al., 1998) datasets. We also estimate the impact of Fed ALS on large language models (LLMs) in fine-tuning OPT-125M (Zhang et al., 2022) on the Multi-Genre Natural Language Inference (Multi NLI) corpus (Williams et al., 2018). |
| Dataset Splits | Yes | For image classification, we initially sorted the data based on their labels and subsequently divided it among nodes following this sorted sequence. In Multi NLI, we sorted the sentences based on their genre. ... Table 4: Data distribution IID (shuffled and split), Non-IID (sorted based on labels then split) ... Table 5: Data distribution IID (shuffled and split), Non-IID (sorted based on genre then split) |
| Hardware Specification | No | The experimentation was conducted on a network consisting of five nodes alongside a central server. No specific hardware details (GPU/CPU models, memory) are provided. |
| Software Dependencies | No | SGD with momentum was employed as the optimizer, with the momentum set to 0.9, and the weight decay to 10 4. For the LLM fine-tuning, we employed a batch size of 16 sentences from the corpus, and the optimizer used was Adam W. Specific version numbers for software libraries or environments are not provided. |
| Experiment Setup | Yes | For image classification, we utilized a batch size of 64 per node. SGD with momentum was employed as the optimizer, with the momentum set to 0.9, and the weight decay to 10 4. For the LLM fine-tuning, we employed a batch size of 16 sentences from the corpus, and the optimizer used was Adam W. In all the experiments, to perform a grid search for the learning rate, we conducted each experiment by multiplying and dividing the learning rate by powers of two, stopping each experiment after reaching a local optimum learning rate. We repeat each experiment 20 times and present the error bars associated with the randomness of the optimization. In every figure, we include the average and standard deviation error bars. Detailed experimental setup is provided in Appendix E of the supplementary materials. ... Table 4: Local Steps τ 5, Adaptation coefficient α 10, Batch size 64 per client, Momentum 0.9, Weight decay 10^-4, Number of Iterations 10^4 for IID and 2*10^4 for non-IID, Repetitions 20. ... Table 5: Local Steps τ 5, Adaptation coefficient α 10, Batch size 16 sentences per client, Adam β1 0.9, Adam β2 0.999, Adam ε 10^-8, Number of Iterations 10^4 for IID and 2*10^4 for non-IID, Repetitions 20. |