reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Authors: Shentong Mo, Xufang Luo, Dongsheng Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains. The results demonstrate that our p Mo E not only achieves superior performance with a large margin of improvements but also offers an optimal trade-off between computational efficiency and adaptation effectiveness compared to existing methods.
Researcher Affiliation	Collaboration	Shentong Mo1, Xufang Luo2, Dongsheng Li2 1Carnegie Mellon University 2Microsoft Research
Pseudocode	Yes	In this section, we describe the overall procedure for our proposed framework p Mo E, which leverages a dynamic Mixture-of-Experts (Mo E) prompt tuning mechanism to integrate knowledge from multiple domain experts. The key components of the algorithm include the injection of Expert Prompt Tokens (EPTs) and the dynamic dispatching mechanism, which ensures efficient use of expert knowledge, as shown in Algorithm 1.
Open Source Code	No	We are committed to sharing our code and pre-trained models with the research community upon publication, allowing for transparency, easy replication of our experiments, and further development.
Open Datasets	Yes	Datasets. For the general domain, we leverage two popular classification benchmarks: FGVC (Wah et al., 2011; Nilsback & Zisserman, 2008; Gebru et al., 2017; Khosla et al., 2011; Van Horn et al., 2015) and VTAB-1K (Zhai et al., 2019). For medical imaging, we utilize a broad set of datasets from the Med-VTAB benchmark (Mo et al., 2024a), covering a wide range of medical tasks. For the segmentation tasks, we include the ADE20K (Zhou et al., 2017; 2018), Kvasir-seg polyp (Jha et al., 2020) and the ISIC skin lesion dataset (Gutman et al., 2016).
Dataset Splits	Yes	We follow the same training and validation split as prior work (Jia et al., 2022; Yoo et al., 2023; Mo et al., 2024b). Each task contains 1000 training samples, and we follow the standard splits used in previous work (Jia et al., 2022; Yoo et al., 2023; Mo et al., 2024b). Both datasets are evaluated using 5-fold cross-validation, with performance reported as the average across test splits.
Hardware Specification	Yes	All experiments are conducted on NVIDIA A100 GPUs, with 80 GB of memory, allowing us to efficiently fine-tune models across diverse datasets.
Software Dependencies	No	We implement p Mo E using Py Torch (Paszke et al., 2019) library. We fine-tune the models using the Adam W optimizer (Loshchilov & Hutter, 2017).
Experiment Setup	Yes	For both general and medical datasets, we fine-tune the prompt tokens with the Adam W optimizer (Loshchilov & Hutter, 2017), using a learning rate of 1e-4 and a weight decay of 1e-5. The batch size is set to 32 for all datasets, and training is conducted over 30 epochs.