Training Consistent Mixture-of-Experts-Based Prompt Generator for Continual Learning
Authors: Yue Lu, Shizhou Zhang, De Cheng, Guoqiang Liang, Yinghui Xing, Nannan Wang, Yanning Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four class-incremental learning benchmarks validate the effectiveness and superiority of our approach. We evaluate our approach using four class-incremental learning (CIL) benchmarks: 10-split and 20-split CIAFR100 (Krizhevsky and Hinton 2009), 10-split Image Net-R (Hendrycks et al. 2021) and 10-split Domain Net (Peng et al. 2019). We report the mean values of the final average accuracy and final average forgetting over three runs with different random seeds. Table 1: Comparison between the proposed approach ( -CPG ) and the baseline of sequential fine-tuning using VPT and CLIP models. The upper-bound means jointly training all the classes in the dataset. |
| Researcher Affiliation | Academia | 1School of Computer Science, Northwestern Polytechnical University, China 2School of Telecommunications Engineering, Xidian University, China |
| Pseudocode | Yes | An algorithm of our approach is provided in the Algorithm section of the Appendix. |
| Open Source Code | Yes | Our code is available at https://github.com/ zugexiaodui/Consistent Mo EPrompt Generator. |
| Open Datasets | Yes | We evaluate our approach using four class-incremental learning (CIL) benchmarks: 10-split and 20-split CIAFR100 (Krizhevsky and Hinton 2009), 10-split Image Net-R (Hendrycks et al. 2021) and 10-split Domain Net (Peng et al. 2019). |
| Dataset Splits | Yes | We evaluate our approach using four class-incremental learning (CIL) benchmarks: 10-split and 20-split CIAFR100 (Krizhevsky and Hinton 2009), 10-split Image Net-R (Hendrycks et al. 2021) and 10-split Domain Net (Peng et al. 2019). We focus on the class-incremental learning protocol: the label space Yt of task Tt is disjoint with other tasks, i.e., TT t=1 Yt = . |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. It mentions the backbone model (Vi T-B/16) but no specific GPU/CPU models or other hardware details. |
| Software Dependencies | No | The paper mentions using a Vision Transformer (Vi T) and a balancing-expert strategy by Shazeer et al. (2017) but does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We employ the Vi T-B/16 (Dosovitskiy et al. 2021) as the backbone for all experiments. Each of the 12 Vi T layers is equipped with our proposed prompt generator by default. The Mo E includes 36 experts (M = 36), and we select the top four experts (K = 4) for CIFAR-100 and Image Net-R, while 16 experts are chosen for Domain Net. To enhance the training of Mo E, we implement the balancingexpert strategy as proposed by (Shazeer et al. 2017). We report the mean values of the final average accuracy and final average forgetting over three runs with different random seeds. Additional experimental details can be found in the Appendix. |