Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning

Authors: Yilun Li, Miaomiao Cheng, Xu Han, Wei Song

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations demonstrate that De Kg serves as a plug-and-play module that can seamlessly integrate with existing knowledge-guided context optimization methods and achieves superior performance in three challenging benchmarks. We make our code available at https://github.com/cnunlp/De Kg.
Researcher Affiliation Academia Yilun Li, Miaomiao Cheng B, Xu Han, Wei Song B College of Information Engineering, Capital Normal University, Beijing, China EMAIL
Pseudocode No The paper describes its methodology using mathematical equations and prose (e.g., L = Lce + λLkg + µLkd, Eq. 5) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We make our code available at https://github.com/cnunlp/De Kg.
Open Datasets Yes For downstream tasks, we follow previous work (Radford et al., 2021; Zhou et al., 2022a;b), to conduct experiments on 11 representative image classification datasets, including Image Net (Deng et al., 2009) and Caltech (Fei-Fei et al., 2004) for generic object classification; Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and FGVCAircraft (Maji et al., 2013) for fine-grained visual categorization, Euro SAT (Helber et al., 2019) for satellite image classification, UCF101 (Soomro et al., 2012) for action recognition, DTD (Cimpoi et al., 2014) for texture classification, and SUN397 (Xiao et al., 2010) for scene recognition.
Dataset Splits Yes The base-to-new generalization setting aims to evaluate whether the models learned on base tasks can generalize to new tasks without unseen classes, i.e., a category shift exists between base and new tasks. Following the baselines, on each dataset, we first construct a base and new task by equally dividing the dataset into two groups, then perform prompt tuning on the base classes and test the learned model on both the base and new tasks. Table 1 presents the performance of different methods across 11 datasets with 16-shot samples... To verify the model s ability to develop robust representations with a severely limited amount of downstream data, we follow the previous work (Yao et al., 2024) to train the model using K-shot labeled source images from each class and evaluate the testing domain with the same spaces as the training classes. A summary comparison of the 4-shot setting between the proposed De Kg and existing baselines appears in Table 3
Hardware Specification Yes All experiments were carried out using the HYGON DCU-Z100L server.
Software Dependencies No The paper mentions 'Our implementation is based on Kg Co Op s (Yao et al., 2023) and TCP s (Yao et al., 2024) codes' and 'Vi TB/16 (Dosovitskiy et al., 2021) as the vision backbone'. While it mentions code bases and model architectures, it does not specify software dependencies like Python version, deep learning framework (e.g., PyTorch, TensorFlow) and their specific versions, or other library versions.
Experiment Setup Yes Our implementation is based on Kg Co Op s (Yao et al., 2023) and TCP s (Yao et al., 2024) codes. To ensure a fair comparison, all experiments were conducted using the Vi TB/16 (Dosovitskiy et al., 2021) as the vision backbone and the context length set as 4. Additionally, we maintained consistency with the corresponding baselines in De Kg Kg Co Op and De Kg TCP for random prompt initialization, training epoch, training schedule, and data augmentation settings. In our experiments, we set the ratio of λ/µ to 3/1 by grid search, which translates to λ being 6 and µ being 2.