NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel
Authors: Gabriel Thompson, Kai Yue, Chau-Wai Wong, Huaiyu Dai
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our approach consistently achieves higher accuracy than baselines in highly heterogeneous settings, where other approaches often underperform. Additionally, it reaches target performance in 4.6 times fewer communication rounds. We validate our approach across multiple datasets, network topologies, and heterogeneity settings to ensure robustness and generalization. |
| Researcher Affiliation | Academia | 1Electrical and Computer Engineering, NC State University 2Secure Computing Institute, NC State University, Raleigh, USA. Correspondence to: Chau-Wai Wong <EMAIL>. |
| Pseudocode | Yes | A. NTK-DFL Algorithms Algorithm 1 Consolidated Federated Learning Process Algorithm 2 Per-Round Parameter Averaging Algorithm 3 Local Jacobian Computation and Sending Jacobians Algorithm 4 Weight Evolution |
| Open Source Code | Yes | Source code for NTK-DFL is available at https://github.com/Gabe-Thomp/ntk-dfl. |
| Open Datasets | Yes | Following Yue et al. (2022), we experiment on three datasets: Fashion-MNIST (Xiao et al., 2017), FEMNIST (Caldas et al., 2019), and MNIST (Lecun et al., 1998). |
| Dataset Splits | Yes | For Fashion-MNIST and MNIST, data heterogeneity has been introduced in the form of non-IID partitions created by the symmetric Dirichlet distribution (Good, 1976). ... In FEMNIST, data is split into shards based on the writer of each digit, introducing heterogeneity in the form of feature-skewness. ... When evaluating the selection algorithm in Figure 6, we split the global test set in a 50:50 ratio of validation to test data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For the model, we use a two-layer multilayer perceptron with a hidden width of 100 neurons for all trials. For D-PSGD, we use a learning rate of 0.1, and a batch size of 10 (local epoch count is defined to be one in this approach). For DFed Avg, we use a learning rate of 0.1, a batch size of 25, and 20 local epochs. For DFed Avg M, we use a learning rate of 0.01, a batch size of 50, 20 local epochs, and a momentum of 0.9. For Dis PFL, we use a learning rate of 0.1, a batch size of 10, and 10 local epochs... For DFed SAM... a radius ρ = 0.01, η = 0.01, momentum of 0.99, learning rate decay of 0.95, weight decay of 5 * 10^-4, 5 local epochs, and a batch size of 32... As for the NTK-DFL, we use a learning rate of 0.01 and search over values t {100, 200, ..., 800} during the weight evolution process. |