Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Prompt Tuning In a Compact Attribute Space
Authors: Shiyu Hou, Tianfei Zhou, Shuai Zhang, Ye Yuan, Guoren Wang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments are conducted on 11 visual recognition datasets, including Image Net (Deng et al. 2009), Caltech101 (Fei-Fei, Fergus, and Perona 2004), Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Gool 2014), FGVCAircraft (Maji et al. 2013), SUN397 (Xiao et al. 2010), DTD (Cimpoi et al. 2014), Euro SAT (Helber et al. 2019) and UCF101 (Soomro, Zamir, and Shah 2012). For crossdomain generalization, we additionally evaluate on four Image Net variants including Image Net V2 (Recht et al. 2019), Image Net-Sketch (Wang et al. 2019), Image Net A (Hendrycks et al. 2021b) and Image Net-R (Hendrycks et al. 2021a). We report recognition accuracy (%) and harmonic mean (HM) averaged over 3 seeds as final scores. ... Ablation Study We conduct ablative experiments to investigate the effect of core designs in our approach. |
| Researcher Affiliation | Academia | 1 Beijing Institute of Technology 2 Beijing Zhongguancun Laboratory |
| Pseudocode | No | The paper describes methods using mathematical equations and textual descriptions, but there are no explicitly labeled pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | Code https://github.com/hhhoushiyu/PTin CAS |
| Open Datasets | Yes | The experiments are conducted on 11 visual recognition datasets, including Image Net (Deng et al. 2009), Caltech101 (Fei-Fei, Fergus, and Perona 2004), Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Gool 2014), FGVCAircraft (Maji et al. 2013), SUN397 (Xiao et al. 2010), DTD (Cimpoi et al. 2014), Euro SAT (Helber et al. 2019) and UCF101 (Soomro, Zamir, and Shah 2012). For crossdomain generalization, we additionally evaluate on four Image Net variants including Image Net V2 (Recht et al. 2019), Image Net-Sketch (Wang et al. 2019), Image Net A (Hendrycks et al. 2021b) and Image Net-R (Hendrycks et al. 2021a). |
| Dataset Splits | Yes | To assess PTin CAS s generalizability to unseen classes, we follow (Khattak et al. 2023a) to divide each dataset into base and novel classes. ... For each of the 1K classes in Image Net, we sample 16 examples for training. ... Specifically, experiments excluding few-shot classification are conducted utilizing a selection of 16 randomly sampled shots per class. ... We train models with {1, 2, 4, 8, 16} examples per class for each dataset. |
| Hardware Specification | Yes | All models are trained on a NVIDIA RTX 3090 GPU equipped with a 24GB memory. |
| Software Dependencies | No | The paper mentions using CLIP and GPT-3.5 models and a Vi T-B/16 CLIP model, but does not provide specific version numbers for underlying software libraries, programming languages, or other dependencies like PyTorch, TensorFlow, or CUDA versions. |
| Experiment Setup | Yes | By default, the coefficient λ for LATT is set as 2, and the clustering number K is set as 64. ... Following (Khattak et al. 2023a,b; Zhou et al. 2022a), we adopt a pre-trained Vi T-B/16 CLIP model and predominantly employ a few-shot training approach. ... We strictly adhere to the training setting specified in the official implementation of each baseline. |