reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semantic Alignment for Prompt-Tuning in Vision Language Models

Authors: Hari Chandana Kuchibhotla, Sai Srinivas Kancheti, Abbavaram Gowtham Reddy, Vineeth N. Balasubramanian

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive experiments conducted across 11 benchmark datasets show that our method outperforms established methods, demonstrating substantial improvements. Fig. 1 illustrates the effectiveness of SAP over other baselines on two benchmarks, Generalized Zero-Shot Classification (GZS) and Base-to-Novel Classification (B2N), defined in 5. As our semantic alignment is part-level, SAP showcases superior localization of visual concepts relevant to a class description, as seen through class activation maps, when compared to other baselines.
Researcher Affiliation	Academia	Hari Chandana Kuchibhotla EMAIL Indian Institute of Technology Hyderabad, India Sai Srinivas Kancheti EMAIL Indian Institute of Technology Hyderabad, India Abbavaram Gowtham Reddy EMAIL CISPA Helmholtz Center for Information Security, Saarbrücken, Germany Vineeth N Balasubramanian EMAIL Indian Institute of Technology Hyderabad, India
Pseudocode	Yes	The overall algorithm of SAP is presented in Appendix B.
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We follow (Zhou et al., 2021; 2022; Khattak et al., 2022; 2023) to evaluate our method on 11 image classification datasets of varying complexity. These datasets encompass diverse domains, including generic object datasets like Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004); fine-grained datasets like Stanford Cars (Krause et al., 2013), Oxford Pets (Parkhi et al., 2012), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013); scene recognition dataset SUN397 (Xiao et al., 2010); action recognition dataset UCF101 (Soomro et al., 2012); texture dataset DTD (Cimpoi et al., 2013), and satellite image dataset Euro SAT (Helber et al., 2017).
Dataset Splits	Yes	In GZS, the label space of a dataset is equally split into disjoint base and novel classes. Only a small number (e.g., 16-shot) of labeled samples from the base classes are available as training data. However, during evaluation, the classification label space is the union of base and novel classes. In this setting, following prior work (Zhou et al., 2021; 2022; Khattak et al., 2022; Yao et al., 2023; Khattak et al., 2023; Shi & Yang, 2023), the dataset is split into equal disjoint base and novel classes, and the model is fine-tuned on few-shot (16-shot) training split of the base classes.
Hardware Specification	No	The paper does not contain any specific details about the hardware (e.g., GPU models, CPU types, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific Python, PyTorch, or CUDA versions) required to reproduce the experiments.
Experiment Setup	Yes	The final objective is L(ρ) = Lce(ρ) + λ1Lv steer(ρ) + λ2Lt steer(ρ), where λ1 & λ2 are hyperparameters. where τ is the temperature parameter, and sim is cosine similarity. We use the L1 penalty to regularize global image features and description guided text features.