reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prompt Tuning In a Compact Attribute Space

Authors: Shiyu Hou, Tianfei Zhou, Shuai Zhang, Ye Yuan, Guoren Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments are conducted on 11 visual recognition datasets, including Image Net (Deng et al. 2009), Caltech101 (Fei-Fei, Fergus, and Perona 2004), Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Gool 2014), FGVCAircraft (Maji et al. 2013), SUN397 (Xiao et al. 2010), DTD (Cimpoi et al. 2014), Euro SAT (Helber et al. 2019) and UCF101 (Soomro, Zamir, and Shah 2012). For crossdomain generalization, we additionally evaluate on four Image Net variants including Image Net V2 (Recht et al. 2019), Image Net-Sketch (Wang et al. 2019), Image Net A (Hendrycks et al. 2021b) and Image Net-R (Hendrycks et al. 2021a). We report recognition accuracy (%) and harmonic mean (HM) averaged over 3 seeds as final scores. ... Ablation Study We conduct ablative experiments to investigate the effect of core designs in our approach.
Researcher Affiliation	Academia	1 Beijing Institute of Technology 2 Beijing Zhongguancun Laboratory
Pseudocode	No	The paper describes methods using mathematical equations and textual descriptions, but there are no explicitly labeled pseudocode blocks or algorithm figures.
Open Source Code	Yes	Code https://github.com/hhhoushiyu/PTin CAS
Open Datasets	Yes	The experiments are conducted on 11 visual recognition datasets, including Image Net (Deng et al. 2009), Caltech101 (Fei-Fei, Fergus, and Perona 2004), Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Gool 2014), FGVCAircraft (Maji et al. 2013), SUN397 (Xiao et al. 2010), DTD (Cimpoi et al. 2014), Euro SAT (Helber et al. 2019) and UCF101 (Soomro, Zamir, and Shah 2012). For crossdomain generalization, we additionally evaluate on four Image Net variants including Image Net V2 (Recht et al. 2019), Image Net-Sketch (Wang et al. 2019), Image Net A (Hendrycks et al. 2021b) and Image Net-R (Hendrycks et al. 2021a).
Dataset Splits	Yes	To assess PTin CAS s generalizability to unseen classes, we follow (Khattak et al. 2023a) to divide each dataset into base and novel classes. ... For each of the 1K classes in Image Net, we sample 16 examples for training. ... Specifically, experiments excluding few-shot classification are conducted utilizing a selection of 16 randomly sampled shots per class. ... We train models with {1, 2, 4, 8, 16} examples per class for each dataset.
Hardware Specification	Yes	All models are trained on a NVIDIA RTX 3090 GPU equipped with a 24GB memory.
Software Dependencies	No	The paper mentions using CLIP and GPT-3.5 models and a Vi T-B/16 CLIP model, but does not provide specific version numbers for underlying software libraries, programming languages, or other dependencies like PyTorch, TensorFlow, or CUDA versions.
Experiment Setup	Yes	By default, the coefficient λ for LATT is set as 2, and the clustering number K is set as 64. ... Following (Khattak et al. 2023a,b; Zhou et al. 2022a), we adopt a pre-trained Vi T-B/16 CLIP model and predominantly employ a few-shot training approach. ... We strictly adhere to the training setting specified in the official implementation of each baseline.