Distributed Quasi-Newton Method for Fair and Fast Federated Learning
Authors: Shayan Mohajer Hamidi, Linfeng Ye
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments conducted on seven different datasets (six vision datasets and one language dataset), we demonstrate that DQN-Fed attains superior fairness level among clients, and converges faster compared to the state-of-the-art fair alternatives. |
| Researcher Affiliation | Academia | Shayan Mohajer Hamidi EMAIL Department of Electrical & Computer Engineering University of Waterloo Linfeng Ye EMAIL Edward S. Rogers Sr. Department of Electrical & Computer Engineering University of Toronto |
| Pseudocode | Yes | Algorithm 1: DQN-Fed. Input: Number of global epochs T, global learning rate ηt, number of local epochs E, local datasets {Dk}k K. for t = 0, 1, . . . , T 1 do Server randomly selects a subset of devices St and sends θt to them. for device k St in parallel do Set ˆθ 1 k = θt and ˆθ 0 k = θt 1 for e = 1, 2, . . . , E do Perform BFGS algorithm as follows Set se k = ˆθ e k ˆθ e 1 k , and ye k = f(ˆθ e k) f(ˆθ e 1 k ). Iteratively update matrix Be+1 k using information from Be k, se k, ye k according to: Be+1 k = Be k Be kse k(se k)TBe k (st k)TBe kse k + ye k(ye k)T (se k)Tye k . (19) end Use BE k to calculate dt k from Equation (18). Send local gradient gk = f(ˆθ E k ) and dt k to the server. end Server finds { gk}k [K] form Equations (7) and (8). Server finds λ from Equation (12). Server calculates dt := PK k=1 λ k gk. Server updates the global model as θt+1 θt ηtdt. end Output: Global model θt. |
| Open Source Code | Yes | The Code for paper is publicly available at https://anonymous.4open.science/r/DQN-Fed-FDD2. |
| Open Datasets | Yes | Datasets: We conduct a comprehensive set of experiments across seven datasets. In this section, we present results for four datasets: CIFAR-{10, 100} (Krizhevsky et al., 2009), FEMNIST (Caldas et al., 2018), and Shakespeare (Mc Mahan et al., 2017). Results for Fashion MNIST (Xiao et al., 2017), Tiny Image Net (Le & Yang, 2015), and CINIC-10 (Darlow et al., 2018) are discussed in Appendix G. Furthermore, we evaluate DQN-Fed s performance on a real-world noisy dataset, Clothing1M (Xiao et al., 2015), in Appendix I. |
| Dataset Splits | Yes | Setup 1: Following Wang et al. (2021b), we sort the dataset based on their classes, and then split them into 200 shards. Each client randomly selects two shards without replacement so that each has the same local dataset size. We use a feedforward neural network with 2 hidden layers. We fix E = 1 and K = 100. We carry out 2000 rounds of communication, and sample 10% of the clients in each round. We run SGD on local datasets with stepsize η = 0.1. Setup 2: We distribute the dataset among the clients deploying Dirichlet allocation (Wang et al., 2020) with β = 0.5. We use Res Net-18 (He et al., 2016) with Group Normalization (Wu & He, 2018). We perform 100 communication rounds in each of which all clients participate. We set E = 1, K = 10 and η = 0.01. |
| Hardware Specification | Yes | These results, presented in Table 2, were obtained using a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using the BFGS algorithm and refers to specific model architectures like ResNet-18 and LSTM, but does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or their dependencies. |
| Experiment Setup | Yes | For the first setting, we also measure and report the wall-clock time (in seconds) for both the benchmark methods and DQN-Fed. These results, presented in Table 2, were obtained using a single NVIDIA RTX 3090 GPU. As shown, the second-order methods (last three columns) exhibit higher computational times. Notably, among the second-order methods, DQN-Fed demonstrates the fastest runtime. For all benchmark methods, we conducted a grid-search to identify the optimal hyper-parameters for the underlying algorithms. The parameters tested for each method are outlined below: q-FFL: q {0, 0.001, 0.01, 0.1, 1, 2, 5, 10}. TERM: t {0.1, 0.5, 1, 2, 5}. Fed LF: ηt {0.01, 0.05, 0.1, 0.5, 1}. Ditto: λ {0.01, 0.05, 0.1, 0.5, 1, 2, 5}. Fed MGDA+: ϵ {0.01, 0.05, 0.1, 0.5, 1}. Fed HEAL: (α, β) = {(0.5, 0.5)}, (γs, γc) = {(0.5, 0.9)}. |