reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel

Authors: Gabriel Thompson, Kai Yue, Chau-Wai Wong, Huaiyu Dai

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that our approach consistently achieves higher accuracy than baselines in highly heterogeneous settings, where other approaches often underperform. Additionally, it reaches target performance in 4.6 times fewer communication rounds. We validate our approach across multiple datasets, network topologies, and heterogeneity settings to ensure robustness and generalization.
Researcher Affiliation	Academia	1Electrical and Computer Engineering, NC State University 2Secure Computing Institute, NC State University, Raleigh, USA. Correspondence to: Chau-Wai Wong <EMAIL>.
Pseudocode	Yes	A. NTK-DFL Algorithms Algorithm 1 Consolidated Federated Learning Process Algorithm 2 Per-Round Parameter Averaging Algorithm 3 Local Jacobian Computation and Sending Jacobians Algorithm 4 Weight Evolution
Open Source Code	Yes	Source code for NTK-DFL is available at https://github.com/Gabe-Thomp/ntk-dfl.
Open Datasets	Yes	Following Yue et al. (2022), we experiment on three datasets: Fashion-MNIST (Xiao et al., 2017), FEMNIST (Caldas et al., 2019), and MNIST (Lecun et al., 1998).
Dataset Splits	Yes	For Fashion-MNIST and MNIST, data heterogeneity has been introduced in the form of non-IID partitions created by the symmetric Dirichlet distribution (Good, 1976). ... In FEMNIST, data is split into shards based on the writer of each digit, introducing heterogeneity in the form of feature-skewness. ... When evaluating the selection algorithm in Figure 6, we split the global test set in a 50:50 ratio of validation to test data.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For the model, we use a two-layer multilayer perceptron with a hidden width of 100 neurons for all trials. For D-PSGD, we use a learning rate of 0.1, and a batch size of 10 (local epoch count is defined to be one in this approach). For DFed Avg, we use a learning rate of 0.1, a batch size of 25, and 20 local epochs. For DFed Avg M, we use a learning rate of 0.01, a batch size of 50, 20 local epochs, and a momentum of 0.9. For Dis PFL, we use a learning rate of 0.1, a batch size of 10, and 10 local epochs... For DFed SAM... a radius ρ = 0.01, η = 0.01, momentum of 0.99, learning rate decay of 0.95, weight decay of 5 * 10^-4, 5 local epochs, and a batch size of 32... As for the NTK-DFL, we use a learning rate of 0.01 and search over values t {100, 200, ..., 800} during the weight evolution process.