Vision Graph Prompting via Semantic Low-Rank Decomposition

Authors: Zixiang Ai, Zichen Liu, Jiahuan Zhou

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate our method significantly improves Vi G s transfer performance on diverse downstream tasks, achieving results comparable to full fine-tuning while maintaining parameter efficiency. We conduct extensive experiments to evaluate the efficacy of our proposed method across diverse domains. Specifically, we test on ten image classification datasets using a pre-trained Vision GNN (Vi G) backbone to validate its performance on vision tasks. Additionally, to demonstrate the generalizability of our approach, we extend it to traditional graph tasks, evaluating nine datasets from the fields of chemistry and biology. The results highlight the adaptability and robustness of our method across different scenarios.
Researcher Affiliation Academia 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China. Correspondence to: Jiahuan Zhou <EMAIL>.
Pseudocode No The paper describes the proposed method using textual descriptions and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/ zhoujiahuan1991/ICML2025-VGP.
Open Datasets Yes For vision tasks, we employ 10 benchmarks listed in Table 1, covering various categories and diverse distributions, following the approaches of DAMVP (Huang et al., 2023) and Ins VP (Liu et al., 2024). The selected datasets include CIFAR10 (Krizhevsky et al., 2009), CIFAR (Krizhevsky et al., 2009), DTD (Cimpoi et al., 2014), CUB (Wah et al., 2011), NABirds (Van Horn et al., 2015), Stanford Dogs (Khosla et al., 2011), Oxford Flowers (Nilsback & Zisserman, 2008), Food (Bossard et al., 2014), GTSRB (Stallkamp et al., 2012), and SVHN (Netzer et al., 2011). ... For the chemistry domain, we use eight graph classification datasets from Molecule Net (Wu et al., 2018) as downstream tasks. For the biology domain, we utilize a dataset consisting of 88K labeled protein ego networks, designed for predicting 5000 coarsegrained biological functions, referred to as the PPI dataset.
Dataset Splits Yes Table 6: Vision dataset statistics used in our downstream adaptation tasks. Dataset ... Train Val Test ... DTD (Cimpoi et al., 2014) textures 47 1,880 1,880 1,880 ... For the chemistry domain, we use eight graph classification datasets from Molecule Net (Wu et al., 2018) as downstream tasks. For the biology domain, we utilize a dataset consisting of 88K labeled protein ego networks, designed for predicting 5000 coarsegrained biological functions, referred to as the PPI dataset. We adopt the challenging scaffold split for the chemistry datasets and the species split for the biology dataset, ensuring alignment with prior works (Hu et al., 2019).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or other hardware specifications used for running the experiments.
Software Dependencies No We utilize the Adam W (Loshchilov & Hutter, 2017) optimizer for optimization and implement cosine learning rate annealing. The paper does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Our experiments on vision tasks are based on a medium pyramid Vision GNN model pre-trained on Image Net-21k (Krizhevsky et al., 2017). With the backbone parameters frozen, only our prompt modules and the task-specific head are trained. Following DAM-VP (Huang et al., 2023), we train for 100 epochs for each dataset and incorporate 10 additional epochs for probing the optimal result. We utilize the Adam W (Loshchilov & Hutter, 2017) optimizer for optimization and implement cosine learning rate annealing. The learning rate is set as 0.001 and the weight decay is 0.05. Regarding the graph tasks, we follow the approach of GPF-Plus (Fang et al., 2023), utilizing a widely used 5-layer GIN as the underlying architecture.