reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Selective Visual Prompting in Vision Mamba

Authors: Yifeng Yao, Zichen Liu, Zhenyu Cui, Yuxin Peng, Jiahuan Zhou

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on various large-scale benchmarks demonstrate that our proposed SVP significantly outperforms state-of-the-art methods.
Researcher Affiliation	Academia	Wangxuan Institute of Computer Technology, Peking University, Beijing 100871, China EMAIL, EMAIL
Pseudocode	No	The paper describes methods using mathematical formulations and diagrams (Figures 1, 2, 3) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' sections, nor does it present structured, code-like procedural steps.
Open Source Code	Yes	Code https://github.com/zhoujiahuan1991/AAAI2025-SVP
Open Datasets	Yes	Following prior works (Huang et al. 2023; Pei et al. 2024), our experiments are carried out on two image classification benchmarks HTA and VTAB. HTA. The head tuning adaptation benchmark (Huang et al. 2023) comprises 10 datasets including CIFAR10 (Krizhevsky, Hinton et al. 2009), CIFAR100 (Krizhevsky, Hinton et al. 2009), DTD (Cimpoi et al. 2014), CUB200 (Wah et al. 2011), NABirds (Van Horn et al. 2015), Stanford-Dogs (Khosla et al. 2011), Oxford-Flowers (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), GTSRB (Stallkamp et al. 2012) and SVHN (Netzer et al. 2011). VTAB-1K. It collects 19 benchmarks from Visual Task Adaptation (Zhai et al. 2019)... Our experiments primarily involve three pre-trained vision models: Vi T-Small/16 and Vim-Small, both of which are pre-trained on Image Net-1K (Russakovsky et al. 2015), and Vi T-Base/16 (Dosovitskiy et al. 2020), which is pre-trained on Image Net-21K (Krizhevsky, Sutskever, and Hinton 2012).
Dataset Splits	Yes	VTAB-1K. It collects 19 benchmarks from Visual Task Adaptation (Zhai et al. 2019), categorized into three groups: i) Natural, ii) Specialized, and iii) Structured, each with 1000 training examples. Following (Zhai et al. 2019; Jia et al. 2022), we use an 800-200 train/val split.
Hardware Specification	No	The paper discusses various pre-trained vision models and datasets used in experiments but does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for running the experiments.
Software Dependencies	No	The paper mentions using the AdamW optimizer and cosine annealing, but it does not specify versions for any programming languages, libraries (e.g., PyTorch, TensorFlow), or other software dependencies required to reproduce the experiments.
Experiment Setup	Yes	Following (Huang et al. 2023), all methods are trained for 100 epochs across all datasets for a fair comparison. For the compared methods, we use the optimizers specified in the original papers to achieve better performance. In our approach, we utilize the Adam W (Loshchilov and Hutter 2017) optimizer for optimization and implement cosine annealing. The number of shared layers in Cross-Prompting is set to 4, 8, or 12, depending on the dataset, and the hidden dimension of the inner-prompts generator is set to 64.