reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FedSPU: Personalized Federated Learning for Resource-Constrained Devices with Stochastic Parameter Update

Authors: Ziru Niu, Hai Dong, A. K. Qin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Fed SPU outperforms federated dropout by 4.45% on average in terms of accuracy. Furthermore, an introduced early stopping scheme leads to a significant reduction of the training time in Fed SPU by 25% 71% while maintaining high accuracy.
Researcher Affiliation	Academia	1RMIT University, Melbourne, VIC 3000, Australia 2Swinburne University of Technology, Hawthorn, VIC 3122, Australia EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Fed SPU; Algorithm 2: Fed SPU with Early Stopping (Fed SPU + ES)
Open Source Code	No	The paper mentions that the experiment is implemented with Pytorch and the Flower framework, but it does not provide an explicit statement or link for the source code of their own methodology.
Open Datasets	Yes	We evaluate Fed SPU on three real-world datasets that are very commonly used in the state-of-the-art, including: Extended MNIST (EMNIST) contains 814,255 images of human-written digits/characters from 62 categories (numbers 0-9 and 52 upper/lower-case English letters). Each sample is a black-and-white-based image with 28 28 pixels (Cohen et al. 2017). CIFAR10 contains 50,000 images of real-world objects across 10 categories. Each sample is an RGB-based colorful image with 32 32 pixels (Krizhevsky et al. 2009). Google Speech is an audio dataset containing 101,012 audio commands from more than 2,000 speakers. Each sample is a human-spoken word belonging to one of the 35 categories (Warden 2018).
Dataset Splits	Yes	We split each client s dataset into a training set and a testing set with the split factor λ = 0.7. Ltest is the testing error of wt k on k s validation set.
Hardware Specification	Yes	The server runs on a desktop computer and clients run on NVIDIA Jetson Nano Developer Kits with one 128-core Maxwell GPU and 4GB 64-bit memory.
Software Dependencies	Yes	The experiment is implemented with Pytorch 2.0.0 and the Flower framework (Beutel et al. 2022).
Experiment Setup	Yes	The maximum global iteration is set to T = 500 with a total of M = 100 clients. The number of active clients per round is set to 10, and each client has five local training epochs (Horv ath et al. 2021). For ES, if the number of non-stopped clients is less than 10, then all non-stopped clients will be selected. The learning rate is set to 2e-4, 5e-4 and 0.1 respectively for EMNIST, Google Speech and CIFAR10. The batch size is set to 16 for EMNIST and Google Speech, and 128 for CIFAR10. ... The values of pk for the five clusters are 0.2, 0.4, 0.6, 0.8 and 1.0 respectively. For Fed Select, the initial and final values of pk are set to 0.25 and 0.5 (Tamirisa et al. 2024).