Hypernetwork Aggregation for Decentralized Personalized Federated Learning

Authors: Weishi Li, Yong Peng, Mengyao Du, Fuhui Sun, Xiaoyan Wang, Li Shen

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various data heterogeneous environments demonstrate that DFed HP can reduce communication costs, accelerate convergence rate, and improve generalization performance compared with state-of-the-art (SOTA) baselines. We conduct experiments on non-IID settings across different data partitions (Dirichlet and Pathological distribution) and different partition coefficients. And then we compare the performance of our algorithm with many SOTA baselines in CIFAR10, CIFAR100, and Tiny-Image Net datasets. Extensive evaluations of various classification tasks show that our algorithm can achieve competitive performance, with improvements in both communication cost and convergence performance.
Researcher Affiliation Academia 1College of Systems Engineering, National University of Defense Technology 2Information Technology Service Center of People s Court 3School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 DFed HP 1: Input: total number of devices n and communication rounds T. Learning rate: personal part ηv, hypernet ηφ and embedding vectors ηz. Number of local iterates Kv, Kφ and Kz. 2: Output: hyper parameters φT i , personal part v T i and embedding vectors z T i after the final communication of all clients.
Open Source Code No The paper does not contain any explicit statements about code availability, nor does it provide links to source code repositories or mention code in supplementary materials.
Open Datasets Yes We evaluate the performance of DFed HP on the CIFAR-10, CIFAR-100, and Tiny Image Net datasets under Dirichlet distribution and Pathological distribution.
Dataset Splits No We partition the training and testing data according to the Dirichlet distribution Dir(α). The smaller partition alpha α is, the more uneven the data distribution among clients will be, resulting in higher data heterogeneity [Kotelevskii et al., 2023; Wang et al., 2020]. In addition, we sample different classes from the dataset for each client. The fewer classes each client has, the more heterogeneous the setting becomes. The paper describes how data is partitioned among clients to create non-IID settings, but does not provide specific train/test/validation percentages or absolute sample counts for dataset splits for reproduction.
Hardware Specification No The paper mentions "video memory costs" in Table 3 but does not specify any particular GPU models (e.g., NVIDIA A100), CPU models, or detailed computer specifications used for running the experiments.
Software Dependencies No The paper mentions using Res Net-18 (a model architecture) and SGD (an optimizer) but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes In all experiments, all algorithms are conducted on Res Net-18 [He et al., 2016] with batch normalization. We record the communication between the client and server (or between clients) at 150 rounds. The total number of clients is 100 and the communication radio is 0.1 for each round. The batch size is 128 and the local training epoch is 4. The experiment uses SGD as the optimizer with momentum of 0.9. The distributed topologies is random. For DFed HP, the embedding vector size is 128. Although dimensions of kernels in each layer are different, the dimension of a kernel is often an integer multiple of a fixed value [Ha et al., 2016]. We choose 64 as the fixed value of Res Net-18. We conduct multiple experiments on the learning rates of all algorithms.