reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning

Authors: Chujie Zhao, Tianren Zhang, Guanyu Chen, Yizhou Jiang, Feng Chen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that M3PL effectively boosts the model s generalization capability, achieving state-of-the-art performance under various distribution shifts.
Researcher Affiliation	Academia	Chujie Zhao EMAIL Department of Automation Tsinghua University, Beijing, China
Pseudocode	No	The paper describes methods using mathematical formulations and descriptive text, such as in Section 5 'Analysis and Methodology' and its subsections, but does not present any formal pseudocode blocks or algorithms.
Open Source Code	Yes	Reproducibility. We provide publicly the source code of M3PL, which contains the configuration files we used, to ensure the reliability and reproducibility of our experimental results.
Open Datasets	Yes	For cross-dataset generalization and from base-to-new generalization settings, we follow the protocols of Zhou et al. (2022a;b); Khattak et al. (2023a) and consider 11 recognition datasets, including Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004) for generic recognition, Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014) and FGVCAircraft (Maji et al., 2013) for fine-grained classification, SUN397 (Xiao et al., 2010) for scene classification, DTD (Cimpoi et al., 2014) for texture recognition, Euro SAT (Helber et al., 2019) for satellite image recognition, and UCF101 (Soomro et al., 2012) for action recognition .
Dataset Splits	Yes	Following standard experimental settings (Zhou et al., 2022b; Rasheed et al., 2023), we train a CLIP Vi T-B/16 (Radford et al., 2021) on Image Net in a few-shot fashion, by randomly sampling 16 images per class in training.
Hardware Specification	Yes	All experiments are conducted on NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions software components like 'CLIP Vi T-B/16' and 'SGD optimizer' but does not provide specific version numbers for any key software libraries or programming languages used.
Experiment Setup	Yes	For the prompt learning method, we employ the baseline IVLP (Rasheed et al., 2023)... we train a CLIP Vi T-B/16 ... on Image Net in a few-shot fashion, by randomly sampling 16 images per class in training... We utilize an SGD optimizer with a learning rate of 2.5e-3, weight decay of 5e-4, and training for 30 epochs (for a few datasets prone to overfitting, the training was limited to 20 epochs). The number of prompts, M, is set to 8, with a balance coefficient, λ, of 1.0 (and 1.2 for Euro SAT).