AdaFed: Fair Federated Learning via Adaptive Common Descent Direction

Authors: Shayan Mohajer Hamidi, EN-HUI YANG

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the effectiveness of Ada Fed on a suite of federated datasets, and demonstrate that Ada Fed outperforms state-of-the-art fair FL methods. By conducting thorough experiments over seven different datasets (six vision datasets, and a language one), we show that Ada Fed can yield a higher level of fairness among the clients while achieving similar prediction accuracy compared to the state-of-the-art fair FL algorithms. Section 7 is dedicated to Experiments with performance tables and figures.
Researcher Affiliation Academia Shayan Mohajer Hamidi EMAIL Department of Electrical and Computer Engineering University of Waterloo En-Hui Yang EMAIL Department of Electrical and Computer Engineering University of Waterloo
Pseudocode Yes Algorithm 1 Ada Fed 1: Input: Number of global epochs T, number of local epochs e, global learning rate ηt, local learning rate η, initial global model θ0, local datasets {Dk}k K. 2: for t = 0, 1, . . . , T 1 do 3: Server randomly selects a subset of devices St and sends θt to them. 4: for device k St in parallel do [local training] 5: Store the value θt in θinit; that is θinit θt. 6: for e epochs do 7: Perform (stochastic) gradient descent over local dataset Dk to update: θt θt η fk(θt, Dk). 8: end for 9: Send the pseudo-gradient gk := θinit θt and local loss value fk(θt) to the server. 10: end for 11: for k = 1, 2, . . . , |St| do 12: Find gk form Equations (5) and (6). 13: end for 14: Find λ from Equation (14). 15: Calculate dt := PK k=1 λ k gk. 16: θt+1 θt ηtdt. 17: end for 18: Output: Global model θT .
Open Source Code No The paper does not provide explicit statements about releasing code, nor does it include links to a code repository for the methodology described.
Open Datasets Yes Datasets: We conduct a thorough set of experiments over seven datasets. The results for four datasets, namely CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009), FEMNIST (Caldas et al., 2018) and Shakespear (Mc Mahan et al., 2017) are reported in this section; and those for Fashion MNIST (Xiao et al., 2017), Tiny Image Net (Le & Yang, 2015), CINIC-10 (Darlow et al., 2018) are reported in Appendix D. Particularly, in order to demonstrate the effectiveness of Ada Fed in different FL scenarios, for each of the datasets reported in this section, we consider two different FL setups. In addition, we tested the effectiveness of Ada Fed over a real-world noisy dataset, namely Clothing1M (Xiao et al., 2015), in Appendix I.
Dataset Splits Yes Setup 1: Following (Wang et al., 2021b), we sort the dataset based on their classes, and then split them into 200 shards. Each client randomly selects two shards without replacement so that each has the same local dataset size. ... Setup 2: We distribute the dataset among the clients deploying Dirichlet allocation (Wang et al., 2020) with β = 0.5. ... FEMNIST-skewed: Here K = 100. We first sample 10 lower case characters ( a j ) from Extended MNIST (EMNIST), and then randomly assign 5 classes to each of the 100 devices. ... For CINIC-10: we add more non-iidness to the dataset by distributing the data among the clients using Dirichlet allocation with β = 0.5.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. It mentions 'edge devices/clients' but does not specify the hardware used for the experimental setup itself.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9, etc.).
Experiment Setup Yes Setup 1: ...We use a feedforward neural network with 2 hidden layers. We fix e = 1 and K = 100. We carry out 2000 rounds of communication, and sample 10% of the clients in each round. We run SGD on local datasets with stepsize η = 0.1. Setup 2: ...We use Res Net-18 (He et al., 2016) with Group Normalization (Wu & He, 2018). We perform 100 communication rounds in each of which all clients participate. We set e = 1, K = 10 and η = 0.01. ...The best hyper-parameters for the benchmark methods are: q = 10 for q-FFL, ϵ = 0.5 for Fed MGDA+, and (α, β) = {(0.5, 0.5)}, (γs, γc) = {(0.5, 0.9)} for Fed FA. The detailed results for different γ in Ada Fed are reported in Table 10. We used γ = 5 as the best point for Table 1.