Visual Prompt Based Personalized Federated Learning

Authors: Guanghao Li, Wansen Wu, Yan Sun, Li Shen, Baoyuan Wu, Dacheng Tao

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the CIFAR10 and CIFAR100 datasets show that p Fed PT outperforms several state-of-the-art (SOTA) PFL algorithms by a large margin in various settings. We validate p Fed PT on two image classification datasets, including CIFAR10 (Krizhevsky et al., 2009) and CIFAR100 (Krizhevsky et al., 2009). Empirical results show that p Fed PT beats other SOTA methods of PFL with a 1%-3% improvement in test accuracy.
Researcher Affiliation Collaboration Guanghao Li* EMAIL National University of Defense Technology; Wansen Wu EMAIL National University of Defense Technology; Yan Sun EMAIL The University of Sydney; Li Shen EMAIL JD Explore Academy; Baoyuan Wu EMAIL The Chinese University of Hong Kong, Shenzhen; Dacheng Tao EMAIL The University of Sydney & JD Explore Academy
Pseudocode Yes Algorithm 1: p Fed PT framework
Open Source Code Yes The code is available at: https://github.com/hkgdifyu/p Fed PT.
Open Datasets Yes We validate p Fed PT on two image classification datasets, including CIFAR10 (Krizhevsky et al., 2009) and CIFAR100 (Krizhevsky et al., 2009). Dataset. We adopt real-world datasets for the image classification task, including CIFAR10, CIFAR100, and Tiny Image Net (Oord et al., 2018).
Dataset Splits Yes Dirichlet Partition follows works (Hsu et al., 2019), where we partition the training data according to a Dirichlet distribution Dir(α) for each client and generate the corresponding test data for each client following the same distribution. We specify α equal 0.3 for each dataset. In addition, we evaluate with the pathological partition setup similar to (Zhang et al., 2020), in which each client is only assigned a limited number of classes at random from the total number of classes. We specify that each client possesses 5 classes for CIFAR10 and 50 classes for CIFAR100.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. It only mentions the use of CNN and ViT architectures.
Software Dependencies No The paper mentions using the SGD algorithm as the local optimizer and Tiny ViT architecture, but it does not specify software names with version numbers like Python, PyTorch, or CUDA versions.
Experiment Setup Yes We set the number of clients to 50, and then each client has a 20% chance of participating in each communication round. We utilize the SGD algorithm (Cherry et al., 1998) as the local optimizer for all methods. We use padding as our prompt method. We set batch size as 16 in the local training phase, the local training epochs for the prompt parameters and backbone as 5 in each round, the learning rate for the backbone as 0.005, the learning rate for the prompt parameters as 1, and the padding prompt size as 4. The number of communication rounds is set to 150 for CIFAR10, 300 for CIFAR100. We fix the learning rate for local training as 0.005 and for the prompt parameters training as 1.0. We fix the training batch size as 16 and fix the epoch for local training as 5. For the specific parameters in Fed Prox, the proximal rate is set as 0.0001. For the specific parameters in MOON, the µ is set as 1.0. For the specific parameters in Fed Rep, the personalized learning rate is set as 0.01. For the specific parameters in Fed MTL, the iterations for solving quadratic sub-problems are set as 4000. For the specific parameters in Fed BABU, the fine tuning step is set as 1. For fine-tuning, we train the global model learned in Fed Avg, Fed Prox, and MOON for 5 epochs.