Active Prompt Learning with Vision-Language Model Priors

Authors: Hoyoung Kim, Seokhee Jin, Changhwan Sung, Jaechang Kim, Jungseul Ok

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in active learning scenarios across seven datasets demonstrate that our method outperforms existing baselines. Furthermore, our data-centric approach complements existing model-centric prompt learning methods, offering a general strategy for scalable VLM adaptation.
Researcher Affiliation Academia Hoyoung Kim EMAIL Graduate School of Artificial Intelligence POSTECH Seokhee Jin EMAIL Graduate School of Artificial Intelligence POSTECH Changhwan Sung EMAIL Graduate School of Artificial Intelligence POSTECH Jaechang Kim EMAIL Graduate School of Artificial Intelligence POSTECH Jungseul Ok EMAIL Graduate School of Artificial Intelligence POSTECH
Pseudocode Yes Algorithm 1 Proposed Active Prompt Learning Require: Image set I, initial prompts t0, budget per round B, and the number of rounds R 1: for r = 1, 2, . . . , R do 2: Extract class-guided features i I by combining image and weighted text features via (7) 3: Perform K-means clustering on class-guided features 4: Select representative images from each cluster as candidates via (9) 5: Compute class-wise confidence thresholds using previously labeled data via (11) 6: Construct dataset Dr by assigning pseudo or ground-truth labels via selective querying in (12) 7: Reinitialize and train prompts tr using dataset Dr with the objective in (13) 8: end for 9: return Final dataset DR and trained prompts t R
Open Source Code No The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets Yes Following a previous study (Bang et al., 2024), we use seven publicly available image classification datasets: Oxford Pets (pet species) (Parkhi et al., 2012), FGVCAircraft (aircraft types) (Maji et al., 2013), Caltech101 (general object categories) (Fei-Fei et al., 2004), Flowers102 (flower species) (Nilsback & Zisserman, 2008), DTD (texture patterns) (Cimpoi et al., 2014), Stanford Cars (car models) (Krause et al., 2013), and Euro SAT (satellite land cover types) (Helber et al., 2019).
Dataset Splits Yes For a fair comparison, we follow the active learning scenario established in PCB (Bang et al., 2024). Specifically, experiments are conducted over 8 rounds, with the maximum budget per round set to the number of classes, i.e. B = |C|. Since the total budget varies across datasets, we set it to be fully spent at 100% by the 8th round, with a maximum budget of 12.5% per round.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like CLIP Vi T-B/32, Co Op, and SGD optimizer but does not provide specific version numbers for any software libraries or frameworks.
Experiment Setup Yes At each round r, we reinitialize the learnable prompts tr, consisting of 16 vectors, using a Gaussian distribution with a mean of 0 and a standard deviation of 0.02. Following the training details in Co Op (Zhou et al., 2022b), we train these prompts for 200 epochs per round using the SGD optimizer, initialized with a learning rate of 0.002 and decaying according to a cosine annealing.