Sharpness-aware Zeroth-order Optimization for Graph Transformers

Authors: Yang Liu, Chuan Zhou, Yuhan Lin, Shuai Zhang, Yang Gao, Zhao Li, Shirui Pan

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we conduct extensive experiments on various classical GTs across a wide range of benchmark datasets, which underscore the superior performance of SZO over the state-of-the-art optimizers. (...) 4 Experiment 4.1 Experimental Setup 4.2 Performance on Graph Classification Task 4.3 Performance on Graph Regression Task
Researcher Affiliation Collaboration 1Academy of Mathematics and Systems Science, Chinese Academy of Science 2School of Cyber Security, University of Chinese Academy of Science 3Fudan University 4Zhejiang University 5Hangzhou Yugu Technology 6Griffith University
Pseudocode Yes The training procedure of the SZO algorithm is outlined in Algorithm 1.
Open Source Code Yes https://github.com/liu-yang-maker/SZO
Open Datasets Yes Following established settings in molecular graph tasks, we utilize nine public benchmark datasets: BBBP, Tox21, Sider, Clin Tox and BACE for classification, and ESOL, Lipophilicity, QM7 and QM8 for regression. We assess all models using a random split methodology as recommended by Molecule Net [Wu et al., 2018]
Dataset Splits Yes We assess all models using a random split methodology as recommended by Molecule Net [Wu et al., 2018], dividing the datasets into training, validation, and testing sets with an 80%/10%/10% ratio.
Hardware Specification No The AI-driven experiments, simulations and model training were performed on the robotic AI-Scientist platform of Chinese Academy of Science. Specific hardware details (e.g., GPU/CPU models, memory amounts) are not provided in the paper.
Software Dependencies No The implementations of the backbone models and their respective hyperparameter configurations are sourced from publicly available repositories as detailed in [Rong et al., 2020] and [Chen et al., 2021]. Both GROVER and Co MPT employ Adam as the base optimizer without employing any pre-training strategies. Specific software versions for dependencies like Python, PyTorch, or CUDA are not mentioned.
Experiment Setup No In our experiments, we solely adjust the hyperparameters introduced by SZO. We implemented several optimizers (i.e. SGD, ADAM and SZO) on molecular graph data using two widely-adopted graph transformer backbones: GROVER [Rong et al., 2020] and Co MPT [Chen et al., 2021]. The paper mentions adjusting hyperparameters and using specific optimizers but does not explicitly list the concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) used in the main text for its experiments.