reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts

Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, DVP outperforms baselines on average across 11 downstream datasets. Notably, the DVP-PRM integration enables insights into how individual visual prompts influence classification decisions, providing a probabilistic framework for understanding reprogramming. Our code is available at https://github.com/ tmlr-group/Decoupled VP Section 5 shows the application of DVP to 11 commonly used downstream datasets and four CLIP backbones, demonstrating its effectiveness. The parameter analysis, ablation experiments, and independence tests further validate the rationality of DVP. In conclusion, both theoretical analysis and experimental results verify the soundness of DVP.
Researcher Affiliation	Collaboration	1School of Computing and Information Systems, The University of Melbourne 2School of Computer Science and Engineering, Southeast University 3Idealism Technology (Beijing). Correspondence to: Feng Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Pipeline of DVP
Open Source Code	Yes	Our code is available at https://github.com/ tmlr-group/Decoupled VP
Open Datasets	Yes	All datasets are publicly available and listed as follows: FGVCAircraft (Aircraft) (Maji et al., 2013), Caltech101 (Caltech) (Fei-Fei et al., 2004), Stanford Cars (Cars) (Krause et al., 2013), Texture (DTD) (Cimpoi et al., 2014), Euro SAT (ESAT) (Helber et al., 2019), Flowers102 (Flowers) (Nilsback & Zisserman, 2008), Food101 (Food) (Bossard et al., 2014), Oxford Pets (Pets) (Parkhi et al., 2012), SUN397 (SUN) (Xiao, et al., 2010), UCF101 (UCF) (Soomro et al., 2012), Resisc45 (Resisc) (Cheng et al., 2017).
Dataset Splits	Yes	We follow the prior work (Cai et al., 2025) to set up our benchmark, employing the same methodology to split the 16-shot training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for the experiments. It only mentions general support from 'The University of Melbourne's Research Computing Services and the Petascale Campus Initiative'.
Software Dependencies	No	The paper mentions using an 'SGD optimizer' and a 'cosine annealing scheduler' but does not specify the version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow, scikit-learn) or specific tools used beyond general algorithmic references like K-means.
Experiment Setup	Yes	All VR baseline methods are trained with consistent settings: a learning rate of 40, a momentum of 0.9 (SGD optimizer (Harold et al., 1997)), and a cosine annealing scheduler (Loshchilov & Hutter, 2016), over 200 epochs. Results are averaged across three random seeds. For method-specific hyper-parameters, we followed (Cai et al., 2025) by using a VR noise pattern with a frame size of 30 for VP (Bahng et al., 2022) and a frame size of 16 for AR (Chen et al., 2023; Tsai et al., 2020) and Attr VR (Cai et al., 2025). To ensure fairness, our DVP utilized the same settings as Attr VR. For DVP-cls, we use the same descriptions as (Cai et al., 2025). For K-means, we use a maximum iteration of 300 and the relative tolerance regarding the Frobenius norm of differences in cluster centers to be 1e-4. For DVP-cse, we use GPT-4o-mini (Brown et al., 2020) to generate descriptions, with the maximum token to be 50, stopped at . , and the temperature to be 0.99. We set m = 20 for each cause number in DVP-cse and the attribute numbers in DVP-cls.