reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs

Authors: Hari Chandana Kuchibhotla, Sai Srinivas Kancheti, Abbavaram Gowtham Reddy, Vineeth N. Balasubramanian

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we comprehensively evaluate the classification performance of Nea R for the VF-FGVR task. We begin by describing the datasets, metrics and benchmark methods we compare against. ... The results are shown in Table 3, with all numbers reported for 3-shot training images. ... We conducted a thorough ablation to evaluate the contribution of each component in our pipeline in Table 5.
Researcher Affiliation	Collaboration	Hari Chandana Kuchibhotla EMAIL Indian Institute of Technology Hyderabad, India Sai Srinivas Kancheti EMAIL Indian Institute of Technology Hyderabad, India Abbavaram Gowtham Reddy EMAIL CISPA Helmholtz Center for Information Security, Saarbrücken, Germany Vineeth N Balasubramanian EMAIL & EMAIL Microsoft Research India & Indian Institute of Technology Hyderabad, India
Pseudocode	Yes	An overview of our methodology is presented in Figure 1, and the pseudocode is detailed in Algorithm 1 in the appendix. We begin by discussing the necessary preliminaries. ... Algorithm 1 Nea R algorithm: Training
Open Source Code	Yes	Our code is available at https:/github.com/Nea R.
Open Datasets	Yes	We perform experiments on five benchmark fine-grained datasets: Caltech UCSD Bird-200 (Wah et al., 2011), Stanford Car-196 (Khosla et al., 2011), Stanford Dog-120 (Krause et al., 2013), Flower-102 (Nilsback & Zisserman, 2008), Oxford-IIIT Pet-37 (Parkhi et al., 2012).
Dataset Splits	Yes	Following (Liu et al., 2024a), for each dataset, Nea R and other baselines only have access to m unlabeled training images per class. Unless specified otherwise, we assume m = 3. Results for 1 ≤ m ≤ 10 are shown in Figure 2. ... Table A13: Train and test set sizes of the datasets used in this paper. The number of shots is denoted by m, with m = 3 used as the default in our experiments unless otherwise specified.
Hardware Specification	Yes	We run all our experiments on a single Nvidia Tesla V100-32GB GPU with an Nvidia driver version of 525.85.12.
Software Dependencies	Yes	We use Py Torch 2.4.0 and CUDA 12.0. We utilize the publicly available meta-llama/Llama-3.2-11B-Vision-Instruct model and Qwen/Qwen2-VL-2B-Instruct model from Hugging Face.
Experiment Setup	Yes	For both the Co Op baseline and our method, we introduce 16 trainable context vectors. The same set of prompts are optimized during the warmup stage, and for the subsequent training stage. We use SGD as the optimizer and train for 50 epochs, with 10 warmup epochs. We use a temperature of 2 in the sharpening function. Our batch size is 32. We use the SGD optimizer with a learning rate of 0.002, and use both constant learning rate scheduler and cosine annealing scheduler sequentially. The training hyperparameters are the same for Co Op and Nea R.