pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
Authors: Shentong Mo, Xufang Luo, Dongsheng Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains. The results demonstrate that our p Mo E not only achieves superior performance with a large margin of improvements but also offers an optimal trade-off between computational efficiency and adaptation effectiveness compared to existing methods. |
| Researcher Affiliation | Collaboration | Shentong Mo1, Xufang Luo2, Dongsheng Li2 1Carnegie Mellon University 2Microsoft Research |
| Pseudocode | Yes | In this section, we describe the overall procedure for our proposed framework p Mo E, which leverages a dynamic Mixture-of-Experts (Mo E) prompt tuning mechanism to integrate knowledge from multiple domain experts. The key components of the algorithm include the injection of Expert Prompt Tokens (EPTs) and the dynamic dispatching mechanism, which ensures efficient use of expert knowledge, as shown in Algorithm 1. |
| Open Source Code | No | We are committed to sharing our code and pre-trained models with the research community upon publication, allowing for transparency, easy replication of our experiments, and further development. |
| Open Datasets | Yes | Datasets. For the general domain, we leverage two popular classification benchmarks: FGVC (Wah et al., 2011; Nilsback & Zisserman, 2008; Gebru et al., 2017; Khosla et al., 2011; Van Horn et al., 2015) and VTAB-1K (Zhai et al., 2019). For medical imaging, we utilize a broad set of datasets from the Med-VTAB benchmark (Mo et al., 2024a), covering a wide range of medical tasks. For the segmentation tasks, we include the ADE20K (Zhou et al., 2017; 2018), Kvasir-seg polyp (Jha et al., 2020) and the ISIC skin lesion dataset (Gutman et al., 2016). |
| Dataset Splits | Yes | We follow the same training and validation split as prior work (Jia et al., 2022; Yoo et al., 2023; Mo et al., 2024b). Each task contains 1000 training samples, and we follow the standard splits used in previous work (Jia et al., 2022; Yoo et al., 2023; Mo et al., 2024b). Both datasets are evaluated using 5-fold cross-validation, with performance reported as the average across test splits. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A100 GPUs, with 80 GB of memory, allowing us to efficiently fine-tune models across diverse datasets. |
| Software Dependencies | No | We implement p Mo E using Py Torch (Paszke et al., 2019) library. We fine-tune the models using the Adam W optimizer (Loshchilov & Hutter, 2017). |
| Experiment Setup | Yes | For both general and medical datasets, we fine-tune the prompt tokens with the Adam W optimizer (Loshchilov & Hutter, 2017), using a learning rate of 1e-4 and a weight decay of 1e-5. The batch size is set to 32 for all datasets, and training is conducted over 30 epochs. |