Differentiated Aggregation to Improve Generalization in Federated Learning

Authors: Peyman Gholami, Hulya Seferoglu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results confirm that Fed ALS outperforms the baselines in terms of accuracy in non-iid setups while also saving on communication costs across all setups. Our codes are available for reproducibility. In this section, we assess the performance of Fed ALS using deep neural network model Res Net-20 for CIFAR-10, CIFAR-100 (Krizhevsky, 2009), SVHN (Netzer et al., 2011), and MNIST (Lecun et al., 1998) datasets.
Researcher Affiliation Academia Peyman Gholami EMAIL Department of Electrical and Computer Engineering University of Illinois Chicago; Hulya Seferoglu EMAIL Department of Electrical and Computer Engineering University of Illinois Chicago
Pseudocode Yes Algorithm 1 Fed ALS; Algorithm 2 Fed Avg; Algorithm 3 SCAFFOLD; Algorithm 4 Fed ALS + SCAFFOLD
Open Source Code Yes Our codes are available for reproducibility.
Open Datasets Yes We evaluate the performance of Fed ALS using deep neural network model Res Net-20 for CIFAR-10, CIFAR-100 (Krizhevsky, 2009), SVHN (Netzer et al., 2011), and MNIST (Lecun et al., 1998) datasets. We also estimate the impact of Fed ALS on large language models (LLMs) in fine-tuning OPT-125M (Zhang et al., 2022) on the Multi-Genre Natural Language Inference (Multi NLI) corpus (Williams et al., 2018).
Dataset Splits Yes For image classification, we initially sorted the data based on their labels and subsequently divided it among nodes following this sorted sequence. In Multi NLI, we sorted the sentences based on their genre. ... Table 4: Data distribution IID (shuffled and split), Non-IID (sorted based on labels then split) ... Table 5: Data distribution IID (shuffled and split), Non-IID (sorted based on genre then split)
Hardware Specification No The experimentation was conducted on a network consisting of five nodes alongside a central server. No specific hardware details (GPU/CPU models, memory) are provided.
Software Dependencies No SGD with momentum was employed as the optimizer, with the momentum set to 0.9, and the weight decay to 10 4. For the LLM fine-tuning, we employed a batch size of 16 sentences from the corpus, and the optimizer used was Adam W. Specific version numbers for software libraries or environments are not provided.
Experiment Setup Yes For image classification, we utilized a batch size of 64 per node. SGD with momentum was employed as the optimizer, with the momentum set to 0.9, and the weight decay to 10 4. For the LLM fine-tuning, we employed a batch size of 16 sentences from the corpus, and the optimizer used was Adam W. In all the experiments, to perform a grid search for the learning rate, we conducted each experiment by multiplying and dividing the learning rate by powers of two, stopping each experiment after reaching a local optimum learning rate. We repeat each experiment 20 times and present the error bars associated with the randomness of the optimization. In every figure, we include the average and standard deviation error bars. Detailed experimental setup is provided in Appendix E of the supplementary materials. ... Table 4: Local Steps τ 5, Adaptation coefficient α 10, Batch size 64 per client, Momentum 0.9, Weight decay 10^-4, Number of Iterations 10^4 for IID and 2*10^4 for non-IID, Repetitions 20. ... Table 5: Local Steps τ 5, Adaptation coefficient α 10, Batch size 16 sentences per client, Adam β1 0.9, Adam β2 0.999, Adam ε 10^-8, Number of Iterations 10^4 for IID and 2*10^4 for non-IID, Repetitions 20.