Active Prompt Learning with Vision-Language Model Priors
Authors: Hoyoung Kim, Seokhee Jin, Changhwan Sung, Jaechang Kim, Jungseul Ok
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in active learning scenarios across seven datasets demonstrate that our method outperforms existing baselines. Furthermore, our data-centric approach complements existing model-centric prompt learning methods, offering a general strategy for scalable VLM adaptation. |
| Researcher Affiliation | Academia | Hoyoung Kim EMAIL Graduate School of Artificial Intelligence POSTECH Seokhee Jin EMAIL Graduate School of Artificial Intelligence POSTECH Changhwan Sung EMAIL Graduate School of Artificial Intelligence POSTECH Jaechang Kim EMAIL Graduate School of Artificial Intelligence POSTECH Jungseul Ok EMAIL Graduate School of Artificial Intelligence POSTECH |
| Pseudocode | Yes | Algorithm 1 Proposed Active Prompt Learning Require: Image set I, initial prompts t0, budget per round B, and the number of rounds R 1: for r = 1, 2, . . . , R do 2: Extract class-guided features i I by combining image and weighted text features via (7) 3: Perform K-means clustering on class-guided features 4: Select representative images from each cluster as candidates via (9) 5: Compute class-wise confidence thresholds using previously labeled data via (11) 6: Construct dataset Dr by assigning pseudo or ground-truth labels via selective querying in (12) 7: Reinitialize and train prompts tr using dataset Dr with the objective in (13) 8: end for 9: return Final dataset DR and trained prompts t R |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | Following a previous study (Bang et al., 2024), we use seven publicly available image classification datasets: Oxford Pets (pet species) (Parkhi et al., 2012), FGVCAircraft (aircraft types) (Maji et al., 2013), Caltech101 (general object categories) (Fei-Fei et al., 2004), Flowers102 (flower species) (Nilsback & Zisserman, 2008), DTD (texture patterns) (Cimpoi et al., 2014), Stanford Cars (car models) (Krause et al., 2013), and Euro SAT (satellite land cover types) (Helber et al., 2019). |
| Dataset Splits | Yes | For a fair comparison, we follow the active learning scenario established in PCB (Bang et al., 2024). Specifically, experiments are conducted over 8 rounds, with the maximum budget per round set to the number of classes, i.e. B = |C|. Since the total budget varies across datasets, we set it to be fully spent at 100% by the 8th round, with a maximum budget of 12.5% per round. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like CLIP Vi T-B/32, Co Op, and SGD optimizer but does not provide specific version numbers for any software libraries or frameworks. |
| Experiment Setup | Yes | At each round r, we reinitialize the learnable prompts tr, consisting of 16 vectors, using a Gaussian distribution with a mean of 0 and a standard deviation of 0.02. Following the training details in Co Op (Zhou et al., 2022b), we train these prompts for 200 epochs per round using the SGD optimizer, initialized with a learning rate of 0.002 and decaying according to a cosine annealing. |