reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning

Authors: Yilun Li, Miaomiao Cheng, Xu Han, Wei Song

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluations demonstrate that De Kg serves as a plug-and-play module that can seamlessly integrate with existing knowledge-guided context optimization methods and achieves superior performance in three challenging benchmarks. We make our code available at https://github.com/cnunlp/De Kg.
Researcher Affiliation	Academia	Yilun Li, Miaomiao Cheng B, Xu Han, Wei Song B College of Information Engineering, Capital Normal University, Beijing, China EMAIL
Pseudocode	No	The paper describes its methodology using mathematical equations and prose (e.g., L = Lce + λLkg + µLkd, Eq. 5) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code available at https://github.com/cnunlp/De Kg.
Open Datasets	Yes	For downstream tasks, we follow previous work (Radford et al., 2021; Zhou et al., 2022a;b), to conduct experiments on 11 representative image classification datasets, including Image Net (Deng et al., 2009) and Caltech (Fei-Fei et al., 2004) for generic object classification; Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and FGVCAircraft (Maji et al., 2013) for fine-grained visual categorization, Euro SAT (Helber et al., 2019) for satellite image classification, UCF101 (Soomro et al., 2012) for action recognition, DTD (Cimpoi et al., 2014) for texture classification, and SUN397 (Xiao et al., 2010) for scene recognition.
Dataset Splits	Yes	The base-to-new generalization setting aims to evaluate whether the models learned on base tasks can generalize to new tasks without unseen classes, i.e., a category shift exists between base and new tasks. Following the baselines, on each dataset, we first construct a base and new task by equally dividing the dataset into two groups, then perform prompt tuning on the base classes and test the learned model on both the base and new tasks. Table 1 presents the performance of different methods across 11 datasets with 16-shot samples... To verify the model s ability to develop robust representations with a severely limited amount of downstream data, we follow the previous work (Yao et al., 2024) to train the model using K-shot labeled source images from each class and evaluate the testing domain with the same spaces as the training classes. A summary comparison of the 4-shot setting between the proposed De Kg and existing baselines appears in Table 3
Hardware Specification	Yes	All experiments were carried out using the HYGON DCU-Z100L server.
Software Dependencies	No	The paper mentions 'Our implementation is based on Kg Co Op s (Yao et al., 2023) and TCP s (Yao et al., 2024) codes' and 'Vi TB/16 (Dosovitskiy et al., 2021) as the vision backbone'. While it mentions code bases and model architectures, it does not specify software dependencies like Python version, deep learning framework (e.g., PyTorch, TensorFlow) and their specific versions, or other library versions.
Experiment Setup	Yes	Our implementation is based on Kg Co Op s (Yao et al., 2023) and TCP s (Yao et al., 2024) codes. To ensure a fair comparison, all experiments were conducted using the Vi TB/16 (Dosovitskiy et al., 2021) as the vision backbone and the context length set as 4. Additionally, we maintained consistency with the corresponding baselines in De Kg Kg Co Op and De Kg TCP for random prompt initialization, training epoch, training schedule, and data augmentation settings. In our experiments, we set the ratio of λ/µ to 3/1 by grid search, which translates to λ being 6 and µ being 2.