reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentiable Prompt Learning for Vision Language Models

Authors: Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Jianxi Gao

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the DPL method on the pre-trained CLIP. We empirically find that by using only limited data, our DPL method can find deep continuous prompt configuration with high confidence. The performance on the downstream tasks exhibits the superiority of the automatic design: our method boosts the average test accuracy by 2.60 % on 11 datasets compared to baseline methods. The few-shot learning experiments show that the DPL method can find continuous prompt configurations, i.e., the context length and depth of continuous prompts inserted to the input of each layer. The performance of downstream fine-tuning over 11 datasets shows the superiority of the proposed method.
Researcher Affiliation	Collaboration	Zhenhan Huang1 , Tejaswini Pedapati2 , Pin-Yu Chen2 and Jianxi Gao1 1Rensselaer Polytechnic Institute 2IBM Research EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Searching stage for vision-language models 1: Input: A pre-trained model and two α matrices Aα Rℓ t with randomly initialized weights. 2: while not converged do 3: Update Aα by descending AαLval(E, Aα). 4: Update continuous prompts in both text branch and image branch by descending ELtrain(E, Aα). 5: end while 6: for i = 1 to ℓdo 7: Aα ik = maxm Aα im, k determines the context length of continuous prompts for the i-th block in the best prompt configuration. 8: end for 9: Output: Prompt configuration for the image branch and the text branch.
Open Source Code	Yes	We release our code in https://github.com/ Zhenhan-Huang/Differentiable-Prompt-Learn.
Open Datasets	Yes	We evaluate the DPL method on 11 datasets: Caltech101 [Fei-Fei et al., 2004] and Image Net [Deng et al., 2009] for the generic object classification, Describable Tectures [Cimpoi et al., 2014] for the texture classification, Euro SAT [Helber et al., 2019] for the satellite image classification, FGVCAircraft [Maji et al., 2013], Food101 [Bossard et al., 2014], Oxford Flowers [Nilsback and Zisserman, 2008], Oxford Pets [Parkhi et al., 2012], and Stanford Cars [Krause et al., 2013] for the fine-grained image recognition, UCF101 [Soomro et al., 2012] for the action classification, and SUN397 [Xiao et al., 2010] for the scene recognition.
Dataset Splits	No	We use the few-shot learning setting in the searching stage. The number of shots is the same for the searching and training stages. The number of shots is 16. The results of using 8/4/2/1 shots are shown in Appendix A.5.
Hardware Specification	Yes	Experiments are conducted using a single NVIDIA A40 GPU.
Software Dependencies	No	The paper mentions software components such as the pre-trained CLIP model and PyTorch, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	In both training stage and searching stage, we use the same hyperparameters except for the number of epochs. The number of epochs in the searching stage is 60 while that for the training stage is 40. The batch size is 4 and we use stochastic gradient descent (SGD) to optimize continuous prompts. In the searching stage, two α matrices are optimized using SGD strategy. Learning rate is 3.5 10 3.