reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KPL: Training-Free Medical Knowledge Mining of Vision-Language Models

Authors: Jiaxiang Liu, Tianxiang Hu, Jiawei Du, Ruiyuan Zhang, Joey Tianyi Zhou, Zuozhu Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on both medical and natural image datasets demonstrate that KPL enables effective zero-shot image classification, outperforming all baselines. These findings highlight the great potential in this paradigm of mining knowledge from CLIP for medical image classification and broader areas.
Researcher Affiliation	Academia	1 ZJU-Angelalign R&D Center for Intelligence Healthcare, ZJU-UIUC Institute, Zhejiang University, China 2 Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence, Zhejiang University, China 3 Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (ASTAR), Singapore 4 Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (ASTAR), Singapore EMAIL
Pseudocode	Yes	Pseudocode is provided in Algorithm 1 and 2.
Open Source Code	Yes	Code https://github.com/JXLiu-AI/KPL
Open Datasets	Yes	To validate KPL in medical zero-shot scenarios, we utilized several datasets including Shenzhen (Jaeger et al. 2014), IDRi D (Porwal et al. 2018), Malaria Cell (Hassan et al. 2022), Cataract (Rokhana, Herulambang, and Indraswari 2022), and Montgomery (Jaeger et al. 2014). These datasets cover a diverse range of diseases (Appendix Table 1), which align with previous research (Liu et al. 2023b). Additionally, to test the versatility of KPL, we applied it to a wider array of natural image datasets including CUB (Wah et al. 2011), Places365 (L opez-Cifuentes et al. 2020), Oxford Pets (Zhang et al. 2022), and Image Net (Deng et al. 2009).
Dataset Splits	No	The paper lists multiple datasets but does not explicitly provide information on how these datasets were split into training, validation, or test sets (e.g., percentages or sample counts). It refers to 'zero-shot scenarios' but lacks specific split details for reproducibility.
Hardware Specification	Yes	We perform experiments in Py Torch framework on NVIDIA GEFORCE RTX 3090 GPU.
Software Dependencies	No	The paper mentions 'Py Torch framework' and 'LLMs (GPT-3.5turbo)' and 'Chat GPT' but does not specify exact version numbers for any libraries or solvers required for replication.
Experiment Setup	Yes	For the determination of key hyper-parameters, please refer to Appendix Figure 2. We construct the KEB using LLMs (GPT-3.5turbo) to generate as many descriptions as possible for each category name, ensuring a minimum of n 50 descriptions. The parameters k, Nmax, and τ are determined through Grid Search (Apendix Figure 1).