Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
Authors: Sua Lee, Kyubum Shin, Jung Ho Park
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach achieves superior performance across 11 recognition datasets. |
| Researcher Affiliation | Collaboration | Sua Lee1 , Kyubum Shin2 , Jung Ho Park1 1Seoul National University, 2Naver AI |
| Pseudocode | Yes | Algorithm 1 Training process of De Mul |
| Open Source Code | No | The paper discusses the use of publicly available pre-trained CLIP model, but does not explicitly state that the authors' own implementation code is open-sourced or provide a link. |
| Open Datasets | Yes | We evaluate our approach over 11 datasets, including Image Net(Deng et al., 2009) and publicly available image recognition datasets used in Gal Lo P(Lafon et al., 2024): SUN397(Xiao et al., 2010), Stanford Cars(Krause et al., 2013), UCF101(Soomro et al., 2012), Caltech101(Li et al., 2017), Euro SAT(Helber et al., 2019), FGVC Aircraft(Maji et al., 2013), Food101(Bossard et al., 2014), DTD(Cimpoi et al., 2014), Oxford Flowers(Nilsback & Zisserman, 2008) and Oxford Pets(Parkhi et al., 2012). |
| Dataset Splits | Yes | We follow the same train and test set splits as provided by Co Op(Zhou et al., 2022b) and Gal Lo P(Lafon et al., 2024), using 1, 2, 4, 8 and 16 shots for training and full test sets for evaluating. |
| Hardware Specification | Yes | We do experiments on an V100 GPU for datasets with fewer than 100 classes, and utilize an A100 GPU for larger datasets. |
| Software Dependencies | No | The paper mentions using a publicly available pre-trained CLIP model and GPT-based embedding models through APIs, but does not provide specific version numbers for software dependencies like PyTorch, Python, or other libraries used in their implementation. |
| Experiment Setup | Yes | The number of context tokens N and the number of prompts M are set to 16 and 32, respectively. We optimize the prompts for 100 epochs with SGD optimizer and cosine decay learning rate scheduler, while the base learning rate is set to 0.01 on most datasets. The regularization weight λ is set at 0.05, and the loss balance parameter α is established at 0.5. |