Distributed Quasi-Newton Method for Fair and Fast Federated Learning

Authors: Shayan Mohajer Hamidi, Linfeng Ye

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments conducted on seven different datasets (six vision datasets and one language dataset), we demonstrate that DQN-Fed attains superior fairness level among clients, and converges faster compared to the state-of-the-art fair alternatives.
Researcher Affiliation Academia Shayan Mohajer Hamidi EMAIL Department of Electrical & Computer Engineering University of Waterloo Linfeng Ye EMAIL Edward S. Rogers Sr. Department of Electrical & Computer Engineering University of Toronto
Pseudocode Yes Algorithm 1: DQN-Fed. Input: Number of global epochs T, global learning rate ηt, number of local epochs E, local datasets {Dk}k K. for t = 0, 1, . . . , T 1 do Server randomly selects a subset of devices St and sends θt to them. for device k St in parallel do Set ˆθ 1 k = θt and ˆθ 0 k = θt 1 for e = 1, 2, . . . , E do Perform BFGS algorithm as follows Set se k = ˆθ e k ˆθ e 1 k , and ye k = f(ˆθ e k) f(ˆθ e 1 k ). Iteratively update matrix Be+1 k using information from Be k, se k, ye k according to: Be+1 k = Be k Be kse k(se k)TBe k (st k)TBe kse k + ye k(ye k)T (se k)Tye k . (19) end Use BE k to calculate dt k from Equation (18). Send local gradient gk = f(ˆθ E k ) and dt k to the server. end Server finds { gk}k [K] form Equations (7) and (8). Server finds λ from Equation (12). Server calculates dt := PK k=1 λ k gk. Server updates the global model as θt+1 θt ηtdt. end Output: Global model θt.
Open Source Code Yes The Code for paper is publicly available at https://anonymous.4open.science/r/DQN-Fed-FDD2.
Open Datasets Yes Datasets: We conduct a comprehensive set of experiments across seven datasets. In this section, we present results for four datasets: CIFAR-{10, 100} (Krizhevsky et al., 2009), FEMNIST (Caldas et al., 2018), and Shakespeare (Mc Mahan et al., 2017). Results for Fashion MNIST (Xiao et al., 2017), Tiny Image Net (Le & Yang, 2015), and CINIC-10 (Darlow et al., 2018) are discussed in Appendix G. Furthermore, we evaluate DQN-Fed s performance on a real-world noisy dataset, Clothing1M (Xiao et al., 2015), in Appendix I.
Dataset Splits Yes Setup 1: Following Wang et al. (2021b), we sort the dataset based on their classes, and then split them into 200 shards. Each client randomly selects two shards without replacement so that each has the same local dataset size. We use a feedforward neural network with 2 hidden layers. We fix E = 1 and K = 100. We carry out 2000 rounds of communication, and sample 10% of the clients in each round. We run SGD on local datasets with stepsize η = 0.1. Setup 2: We distribute the dataset among the clients deploying Dirichlet allocation (Wang et al., 2020) with β = 0.5. We use Res Net-18 (He et al., 2016) with Group Normalization (Wu & He, 2018). We perform 100 communication rounds in each of which all clients participate. We set E = 1, K = 10 and η = 0.01.
Hardware Specification Yes These results, presented in Table 2, were obtained using a single NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using the BFGS algorithm and refers to specific model architectures like ResNet-18 and LSTM, but does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or their dependencies.
Experiment Setup Yes For the first setting, we also measure and report the wall-clock time (in seconds) for both the benchmark methods and DQN-Fed. These results, presented in Table 2, were obtained using a single NVIDIA RTX 3090 GPU. As shown, the second-order methods (last three columns) exhibit higher computational times. Notably, among the second-order methods, DQN-Fed demonstrates the fastest runtime. For all benchmark methods, we conducted a grid-search to identify the optimal hyper-parameters for the underlying algorithms. The parameters tested for each method are outlined below: q-FFL: q {0, 0.001, 0.01, 0.1, 1, 2, 5, 10}. TERM: t {0.1, 0.5, 1, 2, 5}. Fed LF: ηt {0.01, 0.05, 0.1, 0.5, 1}. Ditto: λ {0.01, 0.05, 0.1, 0.5, 1, 2, 5}. Fed MGDA+: ϵ {0.01, 0.05, 0.1, 0.5, 1}. Fed HEAL: (α, β) = {(0.5, 0.5)}, (γs, γc) = {(0.5, 0.9)}.