BMIP: Bi-directional Modality Interaction Prompt Learning for VLM
Authors: Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Ming Yang, Lan-Zhe Guo
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on various datasets reveal that BMIP not only outperforms current state-of-the-art methods across all three evaluation paradigms but is also flexible enough to be combined with other prompt-based methods for consistent performance enhancement. Experimental results on 15 benchmarks demonstrated that the BMIP architecture achieves significant performance improvement compared to state-of-the-art (SOTA) prompt learning methods, especially in open-world generalization evaluation paradigm. We conduct comprehensive experiments on 15 benchmarks. The results demonstrated that BMIP achieves SOTA performance across all tasks, and is flexible enough to be combined with other prompt learning methods, consistently enhancing their performance. |
| Researcher Affiliation | Academia | 1School of Intelligence Science and Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3National Key Laboratory for Novel Software Technology, Nanjing University, China EMAIL |
| Pseudocode | No | The paper describes the proposed BMIP method and its components (Deep Language Prompt Learning, Deep Vision Prompt Learning, Vision Language Modality Interaction) conceptually with mathematical formulations, but it does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states "The implementation details are in Appendix A.", but Appendix A is not provided, and there is no explicit statement or link to an open-source code repository for the methodology described. |
| Open Datasets | Yes | We assessed 11 different image classification datasets in the open-world generalization and cross-dataset transfer tasks, encompassing Image Net [Deng et al., 2009], Caltech101 [Fei-Fei et al., 2004], Oxford Pets [Parkhi et al., 2012], Stanford Cars [Krause et al., 2013], Flowers102 [Nilsback and Zisserman, 2008], Food101 [Bossard et al., 2014], FGVC-Aircraft [Maji et al., 2023], SUN397 [Xiao et al., 2010], UCF101 [Soomro et al., 2012], DTD [Cimpoi et al., 2014], and Euro SAT [Helber et al., 2019]. For domain generalization task, Image Net serves as our source dataset, while 4 variants Image Net V2 [Recht et al., 2019], Image Net Sketch [Wang et al., 2019], Image Net-A [Hendrycks et al., 2021b], and Image Net-R [Hendrycks et al., 2021a] serve as the target datasets. |
| Dataset Splits | No | The paper discusses evaluation paradigms such as "Generalization from base to novel classes task" and "open-world generalization" which involve base and new classes, but it does not specify concrete dataset split percentages (e.g., 80/10/10), absolute sample counts for each split, or references to predefined splits for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "The implementation details are in Appendix A.", but Appendix A is not provided, and there are no specific software names with version numbers listed in the main text. |
| Experiment Setup | No | The paper states "The implementation details are in Appendix A.", but Appendix A is not provided, and there are no specific experimental setup details such as hyperparameter values (learning rate, batch size, epochs), optimizer settings, or other training configurations in the main text. |