Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

Authors: Sua Lee, Kyubum Shin, Jung Ho Park

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach achieves superior performance across 11 recognition datasets.
Researcher Affiliation Collaboration Sua Lee1 , Kyubum Shin2 , Jung Ho Park1 1Seoul National University, 2Naver AI
Pseudocode Yes Algorithm 1 Training process of De Mul
Open Source Code No The paper discusses the use of publicly available pre-trained CLIP model, but does not explicitly state that the authors' own implementation code is open-sourced or provide a link.
Open Datasets Yes We evaluate our approach over 11 datasets, including Image Net(Deng et al., 2009) and publicly available image recognition datasets used in Gal Lo P(Lafon et al., 2024): SUN397(Xiao et al., 2010), Stanford Cars(Krause et al., 2013), UCF101(Soomro et al., 2012), Caltech101(Li et al., 2017), Euro SAT(Helber et al., 2019), FGVC Aircraft(Maji et al., 2013), Food101(Bossard et al., 2014), DTD(Cimpoi et al., 2014), Oxford Flowers(Nilsback & Zisserman, 2008) and Oxford Pets(Parkhi et al., 2012).
Dataset Splits Yes We follow the same train and test set splits as provided by Co Op(Zhou et al., 2022b) and Gal Lo P(Lafon et al., 2024), using 1, 2, 4, 8 and 16 shots for training and full test sets for evaluating.
Hardware Specification Yes We do experiments on an V100 GPU for datasets with fewer than 100 classes, and utilize an A100 GPU for larger datasets.
Software Dependencies No The paper mentions using a publicly available pre-trained CLIP model and GPT-based embedding models through APIs, but does not provide specific version numbers for software dependencies like PyTorch, Python, or other libraries used in their implementation.
Experiment Setup Yes The number of context tokens N and the number of prompts M are set to 16 and 32, respectively. We optimize the prompts for 100 epochs with SGD optimizer and cosine decay learning rate scheduler, while the base learning rate is set to 0.01 on most datasets. The regularization weight λ is set at 0.05, and the loss balance parameter α is established at 0.5.