Sharpness-aware Zeroth-order Optimization for Graph Transformers
Authors: Yang Liu, Chuan Zhou, Yuhan Lin, Shuai Zhang, Yang Gao, Zhao Li, Shirui Pan
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we conduct extensive experiments on various classical GTs across a wide range of benchmark datasets, which underscore the superior performance of SZO over the state-of-the-art optimizers. (...) 4 Experiment 4.1 Experimental Setup 4.2 Performance on Graph Classification Task 4.3 Performance on Graph Regression Task |
| Researcher Affiliation | Collaboration | 1Academy of Mathematics and Systems Science, Chinese Academy of Science 2School of Cyber Security, University of Chinese Academy of Science 3Fudan University 4Zhejiang University 5Hangzhou Yugu Technology 6Griffith University |
| Pseudocode | Yes | The training procedure of the SZO algorithm is outlined in Algorithm 1. |
| Open Source Code | Yes | https://github.com/liu-yang-maker/SZO |
| Open Datasets | Yes | Following established settings in molecular graph tasks, we utilize nine public benchmark datasets: BBBP, Tox21, Sider, Clin Tox and BACE for classification, and ESOL, Lipophilicity, QM7 and QM8 for regression. We assess all models using a random split methodology as recommended by Molecule Net [Wu et al., 2018] |
| Dataset Splits | Yes | We assess all models using a random split methodology as recommended by Molecule Net [Wu et al., 2018], dividing the datasets into training, validation, and testing sets with an 80%/10%/10% ratio. |
| Hardware Specification | No | The AI-driven experiments, simulations and model training were performed on the robotic AI-Scientist platform of Chinese Academy of Science. Specific hardware details (e.g., GPU/CPU models, memory amounts) are not provided in the paper. |
| Software Dependencies | No | The implementations of the backbone models and their respective hyperparameter configurations are sourced from publicly available repositories as detailed in [Rong et al., 2020] and [Chen et al., 2021]. Both GROVER and Co MPT employ Adam as the base optimizer without employing any pre-training strategies. Specific software versions for dependencies like Python, PyTorch, or CUDA are not mentioned. |
| Experiment Setup | No | In our experiments, we solely adjust the hyperparameters introduced by SZO. We implemented several optimizers (i.e. SGD, ADAM and SZO) on molecular graph data using two widely-adopted graph transformer backbones: GROVER [Rong et al., 2020] and Co MPT [Chen et al., 2021]. The paper mentions adjusting hyperparameters and using specific optimizers but does not explicitly list the concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) used in the main text for its experiments. |