Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Authors: Jun Luo, Chen Chen, Shandong Wu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 9 datasets under various federated settings demonstrate the efficacy of the proposed p Fed Mo AP algorithm. The results verify the superiority of p Fed Mo AP over compared state-of-the-art methods. |
| Researcher Affiliation | Academia | Jun Luo Intelligent Systems Program University of Pittsburgh Pittsburgh, PA 15213, USA EMAIL Chen Chen Center for Research in Computer Vision University of Central Florida Orlando, FL 32816, USA EMAIL Shandong Wu Intelligent Systems Program Department of Radiology Department of Biomedical Informatics Department of Bioengineering University of Pittsburgh Pittsburgh, PA 15213, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 p Fed Mo AP |
| Open Source Code | Yes | The code is available at https://github. com/ljaiverson/p Fed Mo AP. |
| Open Datasets | Yes | We evaluate the efficacy of the proposed p Fed Mo AP with 9 public benchmark datasets under various federated settings to simulate different types of data heterogeneity. Following previous research (Guo et al., 2023b), to evaluate p Fed Mo AP under label heterogeneity, we adopt 5 representative visual classification datasets used to evaluate CLIP (Radford et al., 2021), namely Oxford Pets Parkhi et al. (2012), Flowers102 Nilsback & Zisserman (2008), DTD Cimpoi et al. (2014), Caltech101 Fei-Fei (2004), Food101 Bossard et al. (2014). We refer to these datasets collectively as CLIP datasets. On these datasets, we test p Fed Mo AP s few-shot performance under label heterogeneity by employing a pathological non-IID setting, where the classes are evenly distributed to the clients with no overlapping classes between any two clients. In addi- Published as a conference paper at ICLR 2025 tion, we use CIFAR10 and CIFAR100 dataset (Gong et al., 2012) and a Dirichlet distribution with Dir(α = 0.5) to simulate the label shift (Hsu et al., 2019). For feature heterogeneity, we adopt Domain Net dataset (Peng et al., 2019) and Office-Caltech10 dataset (Gong et al., 2012), with 6 and 4 inherent domains, respectively. |
| Dataset Splits | Yes | 1) CLIP datasets. Each dataset in CLIP datasets is partitioned into N = 10 clients, each with a disjoint set of classes evenly and randomly assigned to the clients. The training proceeds for T = 10 rounds with r = 100% participation rate. The CLIP uses a Res Net50 backbone. 2) CIFAR10 & CIFAR100. N = 100 clients results from Dir(α = 0.5) partition with T = 120 rounds with r = 10%. CLIP also uses a Res Net50 backbone. 3) Domain Net & Office-Caltech10. Each domain of these two datasets is partitioned to 5 clients with Dir(α = 0.3), resulting in N = 30 for Domain Net and N = 20 for Office-Caltech10. T = 25 for both datasets, while r = 25% for Domain Net and r = 50% for Office-Caltech10. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts) used for running its experiments. It mentions using 'Bridges-2 at Pittsburgh Supercomputing Center' but without explicit hardware specifications like GPU/CPU models. |
| Software Dependencies | No | The paper mentions models like CLIP with ResNet50 and ViT-b-16 backbones, but does not provide specific versions for software dependencies such as Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | For all methods, we use SGD as the optimizer with a learning rate of 0.002 and 5 local epochs (except Co Op is locally trained for 25 epochs without FL). For p Fed Mo AP, we use SGD with a learning rate of 0.01 to train the h = 8-head gating network and default λ = 0.5 in Eq. (10). |