M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning
Authors: Chujie Zhao, Tianren Zhang, Guanyu Chen, Yizhou Jiang, Feng Chen
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that M3PL effectively boosts the model s generalization capability, achieving state-of-the-art performance under various distribution shifts. |
| Researcher Affiliation | Academia | Chujie Zhao EMAIL Department of Automation Tsinghua University, Beijing, China |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text, such as in Section 5 'Analysis and Methodology' and its subsections, but does not present any formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Reproducibility. We provide publicly the source code of M3PL, which contains the configuration files we used, to ensure the reliability and reproducibility of our experimental results. |
| Open Datasets | Yes | For cross-dataset generalization and from base-to-new generalization settings, we follow the protocols of Zhou et al. (2022a;b); Khattak et al. (2023a) and consider 11 recognition datasets, including Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004) for generic recognition, Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014) and FGVCAircraft (Maji et al., 2013) for fine-grained classification, SUN397 (Xiao et al., 2010) for scene classification, DTD (Cimpoi et al., 2014) for texture recognition, Euro SAT (Helber et al., 2019) for satellite image recognition, and UCF101 (Soomro et al., 2012) for action recognition . |
| Dataset Splits | Yes | Following standard experimental settings (Zhou et al., 2022b; Rasheed et al., 2023), we train a CLIP Vi T-B/16 (Radford et al., 2021) on Image Net in a few-shot fashion, by randomly sampling 16 images per class in training. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'CLIP Vi T-B/16' and 'SGD optimizer' but does not provide specific version numbers for any key software libraries or programming languages used. |
| Experiment Setup | Yes | For the prompt learning method, we employ the baseline IVLP (Rasheed et al., 2023)... we train a CLIP Vi T-B/16 ... on Image Net in a few-shot fashion, by randomly sampling 16 images per class in training... We utilize an SGD optimizer with a learning rate of 2.5e-3, weight decay of 5e-4, and training for 30 epochs (for a few datasets prone to overfitting, the training was limited to 20 epochs). The number of prompts, M, is set to 8, with a balance coefficient, λ, of 1.0 (and 1.2 for Euro SAT). |