M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning

Authors: Chujie Zhao, Tianren Zhang, Guanyu Chen, Yizhou Jiang, Feng Chen

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that M3PL effectively boosts the model s generalization capability, achieving state-of-the-art performance under various distribution shifts.
Researcher Affiliation Academia Chujie Zhao EMAIL Department of Automation Tsinghua University, Beijing, China
Pseudocode No The paper describes methods using mathematical formulations and descriptive text, such as in Section 5 'Analysis and Methodology' and its subsections, but does not present any formal pseudocode blocks or algorithms.
Open Source Code Yes Reproducibility. We provide publicly the source code of M3PL, which contains the configuration files we used, to ensure the reliability and reproducibility of our experimental results.
Open Datasets Yes For cross-dataset generalization and from base-to-new generalization settings, we follow the protocols of Zhou et al. (2022a;b); Khattak et al. (2023a) and consider 11 recognition datasets, including Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004) for generic recognition, Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014) and FGVCAircraft (Maji et al., 2013) for fine-grained classification, SUN397 (Xiao et al., 2010) for scene classification, DTD (Cimpoi et al., 2014) for texture recognition, Euro SAT (Helber et al., 2019) for satellite image recognition, and UCF101 (Soomro et al., 2012) for action recognition .
Dataset Splits Yes Following standard experimental settings (Zhou et al., 2022b; Rasheed et al., 2023), we train a CLIP Vi T-B/16 (Radford et al., 2021) on Image Net in a few-shot fashion, by randomly sampling 16 images per class in training.
Hardware Specification Yes All experiments are conducted on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software components like 'CLIP Vi T-B/16' and 'SGD optimizer' but does not provide specific version numbers for any key software libraries or programming languages used.
Experiment Setup Yes For the prompt learning method, we employ the baseline IVLP (Rasheed et al., 2023)... we train a CLIP Vi T-B/16 ... on Image Net in a few-shot fashion, by randomly sampling 16 images per class in training... We utilize an SGD optimizer with a learning rate of 2.5e-3, weight decay of 5e-4, and training for 30 epochs (for a few datasets prone to overfitting, the training was limited to 20 epochs). The number of prompts, M, is set to 8, with a balance coefficient, λ, of 1.0 (and 1.2 for Euro SAT).