NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel

Authors: Gabriel Thompson, Kai Yue, Chau-Wai Wong, Huaiyu Dai

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our approach consistently achieves higher accuracy than baselines in highly heterogeneous settings, where other approaches often underperform. Additionally, it reaches target performance in 4.6 times fewer communication rounds. We validate our approach across multiple datasets, network topologies, and heterogeneity settings to ensure robustness and generalization.
Researcher Affiliation Academia 1Electrical and Computer Engineering, NC State University 2Secure Computing Institute, NC State University, Raleigh, USA. Correspondence to: Chau-Wai Wong <EMAIL>.
Pseudocode Yes A. NTK-DFL Algorithms Algorithm 1 Consolidated Federated Learning Process Algorithm 2 Per-Round Parameter Averaging Algorithm 3 Local Jacobian Computation and Sending Jacobians Algorithm 4 Weight Evolution
Open Source Code Yes Source code for NTK-DFL is available at https://github.com/Gabe-Thomp/ntk-dfl.
Open Datasets Yes Following Yue et al. (2022), we experiment on three datasets: Fashion-MNIST (Xiao et al., 2017), FEMNIST (Caldas et al., 2019), and MNIST (Lecun et al., 1998).
Dataset Splits Yes For Fashion-MNIST and MNIST, data heterogeneity has been introduced in the form of non-IID partitions created by the symmetric Dirichlet distribution (Good, 1976). ... In FEMNIST, data is split into shards based on the writer of each digit, introducing heterogeneity in the form of feature-skewness. ... When evaluating the selection algorithm in Figure 6, we split the global test set in a 50:50 ratio of validation to test data.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For the model, we use a two-layer multilayer perceptron with a hidden width of 100 neurons for all trials. For D-PSGD, we use a learning rate of 0.1, and a batch size of 10 (local epoch count is defined to be one in this approach). For DFed Avg, we use a learning rate of 0.1, a batch size of 25, and 20 local epochs. For DFed Avg M, we use a learning rate of 0.01, a batch size of 50, 20 local epochs, and a momentum of 0.9. For Dis PFL, we use a learning rate of 0.1, a batch size of 10, and 10 local epochs... For DFed SAM... a radius ρ = 0.01, η = 0.01, momentum of 0.99, learning rate decay of 0.95, weight decay of 5 * 10^-4, 5 local epochs, and a batch size of 32... As for the NTK-DFL, we use a learning rate of 0.01 and search over values t {100, 200, ..., 800} during the weight evolution process.