Hypernetwork Aggregation for Decentralized Personalized Federated Learning
Authors: Weishi Li, Yong Peng, Mengyao Du, Fuhui Sun, Xiaoyan Wang, Li Shen
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various data heterogeneous environments demonstrate that DFed HP can reduce communication costs, accelerate convergence rate, and improve generalization performance compared with state-of-the-art (SOTA) baselines. We conduct experiments on non-IID settings across different data partitions (Dirichlet and Pathological distribution) and different partition coefficients. And then we compare the performance of our algorithm with many SOTA baselines in CIFAR10, CIFAR100, and Tiny-Image Net datasets. Extensive evaluations of various classification tasks show that our algorithm can achieve competitive performance, with improvements in both communication cost and convergence performance. |
| Researcher Affiliation | Academia | 1College of Systems Engineering, National University of Defense Technology 2Information Technology Service Center of People s Court 3School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 DFed HP 1: Input: total number of devices n and communication rounds T. Learning rate: personal part ηv, hypernet ηφ and embedding vectors ηz. Number of local iterates Kv, Kφ and Kz. 2: Output: hyper parameters φT i , personal part v T i and embedding vectors z T i after the final communication of all clients. |
| Open Source Code | No | The paper does not contain any explicit statements about code availability, nor does it provide links to source code repositories or mention code in supplementary materials. |
| Open Datasets | Yes | We evaluate the performance of DFed HP on the CIFAR-10, CIFAR-100, and Tiny Image Net datasets under Dirichlet distribution and Pathological distribution. |
| Dataset Splits | No | We partition the training and testing data according to the Dirichlet distribution Dir(α). The smaller partition alpha α is, the more uneven the data distribution among clients will be, resulting in higher data heterogeneity [Kotelevskii et al., 2023; Wang et al., 2020]. In addition, we sample different classes from the dataset for each client. The fewer classes each client has, the more heterogeneous the setting becomes. The paper describes how data is partitioned among clients to create non-IID settings, but does not provide specific train/test/validation percentages or absolute sample counts for dataset splits for reproduction. |
| Hardware Specification | No | The paper mentions "video memory costs" in Table 3 but does not specify any particular GPU models (e.g., NVIDIA A100), CPU models, or detailed computer specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using Res Net-18 (a model architecture) and SGD (an optimizer) but does not provide specific software names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | In all experiments, all algorithms are conducted on Res Net-18 [He et al., 2016] with batch normalization. We record the communication between the client and server (or between clients) at 150 rounds. The total number of clients is 100 and the communication radio is 0.1 for each round. The batch size is 128 and the local training epoch is 4. The experiment uses SGD as the optimizer with momentum of 0.9. The distributed topologies is random. For DFed HP, the embedding vector size is 128. Although dimensions of kernels in each layer are different, the dimension of a kernel is often an integer multiple of a fixed value [Ha et al., 2016]. We choose 64 as the fixed value of Res Net-18. We conduct multiple experiments on the learning rates of all algorithms. |