reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

Authors: Sua Lee, Kyubum Shin, Jung Ho Park

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach achieves superior performance across 11 recognition datasets.
Researcher Affiliation	Collaboration	Sua Lee1 , Kyubum Shin2 , Jung Ho Park1 1Seoul National University, 2Naver AI
Pseudocode	Yes	Algorithm 1 Training process of De Mul
Open Source Code	No	The paper discusses the use of publicly available pre-trained CLIP model, but does not explicitly state that the authors' own implementation code is open-sourced or provide a link.
Open Datasets	Yes	We evaluate our approach over 11 datasets, including Image Net(Deng et al., 2009) and publicly available image recognition datasets used in Gal Lo P(Lafon et al., 2024): SUN397(Xiao et al., 2010), Stanford Cars(Krause et al., 2013), UCF101(Soomro et al., 2012), Caltech101(Li et al., 2017), Euro SAT(Helber et al., 2019), FGVC Aircraft(Maji et al., 2013), Food101(Bossard et al., 2014), DTD(Cimpoi et al., 2014), Oxford Flowers(Nilsback & Zisserman, 2008) and Oxford Pets(Parkhi et al., 2012).
Dataset Splits	Yes	We follow the same train and test set splits as provided by Co Op(Zhou et al., 2022b) and Gal Lo P(Lafon et al., 2024), using 1, 2, 4, 8 and 16 shots for training and full test sets for evaluating.
Hardware Specification	Yes	We do experiments on an V100 GPU for datasets with fewer than 100 classes, and utilize an A100 GPU for larger datasets.
Software Dependencies	No	The paper mentions using a publicly available pre-trained CLIP model and GPT-based embedding models through APIs, but does not provide specific version numbers for software dependencies like PyTorch, Python, or other libraries used in their implementation.
Experiment Setup	Yes	The number of context tokens N and the number of prompts M are set to 16 and 32, respectively. We optimize the prompts for 100 epochs with SGD optimizer and cosine decay learning rate scheduler, while the base learning rate is set to 0.01 on most datasets. The regularization weight λ is set at 0.05, and the loss balance parameter α is established at 0.5.