Exact Certification of (Graph) Neural Networks Against Label Poisoning

Authors: Mahalakshmi Sabanayagam, Lukas Gosch, Stephan Günnemann, Debarghya Ghoshdastidar

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Sec. 4.1 we thoroughly investigate our sample-wise and collective certificates. Sec. 4.2 discusses in detail the effect of architectural choices and graph structure. Datasets. We use the real-world graph datasets Cora-ML (Bojchevski & G unnemann, 2018) and Citeseer (Giles et al., 1998) for multi-class certification. We evaluate binary class certification using Polblogs (Adamic & Glance, 2005), and by extracting the subgraphs containing the top two largest classes from Cora-ML, Citeseer, Wiki-CS (Mernyei & Cangea, 2020), Cora (Mc Callum et al., 2000) and Chameleon (Rozemberczki et al., 2021) referring to these as Cora-MLb, Citeseerb, Wiki-CSb, Corab and Chameleonb, respectively.
Researcher Affiliation Academia 1 School of Computation, Information and Technology, Technical University of Munich 2 Munich Data Science Institute 3 Munich Center for Machine Learning (MCML); Germany EMAIL
Pseudocode No The paper describes methods and theorems but does not contain explicitly labeled pseudocode or algorithm blocks. The derivations are mathematical and descriptions are in prose.
Open Source Code Yes The code is available at https://github.com/saper0/qpcert.
Open Datasets Yes We use the real-world graph datasets Cora-ML (Bojchevski & G unnemann, 2018) and Citeseer (Giles et al., 1998) for multi-class certification. We evaluate binary class certification using Polblogs (Adamic & Glance, 2005), and by extracting the subgraphs containing the top two largest classes from Cora-ML, Citeseer, Wiki-CS (Mernyei & Cangea, 2020), Cora (Mc Callum et al., 2000) and Chameleon (Rozemberczki et al., 2021) referring to these as Cora-MLb, Citeseerb, Wiki-CSb, Corab and Chameleonb, respectively.
Dataset Splits Yes We choose 10 nodes per class for training for all datasets, except for Citeseer, for which we choose 20. No separate validation set is needed as we perform 4-fold cross-validation (CV) for hyperparameter tuning. All results are averaged over 5 seeds (multiclass datasets: 3 seeds) and reported with their standard deviation. The test set for collective certificates consists of all unlabeled nodes on CSBM and CBA, and random samples of 50 unlabeled nodes for real-world graphs. The samplewise certificate is calculated on all unlabeled nodes.
Hardware Specification No We used Gurobi to solve the MILP problems and all our experiments are run on CPU on an internal cluster. The memory requirement to compute sample-wise and collective certificates depends on the length MILP solving process.
Software Dependencies Yes All results concern the infinite-width limit and are obtained by solving the MILPs in Thm. 1 and 2 using Gurobi 11.0.1 (Gurobi Optimization, LLC, 2023) and the GNN s NTK as derived in Gosch et al. (2024) and Sabanayagam et al. (2023).
Experiment Setup Yes All other hyperparameters are chosen based on 4-fold CV, given in App. G.2. We define the row and symmetric normalizations as Srow = b D 1 b A, Ssym = b D 1/2 b A b D 1/2 with b D and b A the degree and adjacency matrices of the given graph G with an added self-loop. For CSBM, we choose S to Srow for GCN, SGC, GCN Skip-α and GCN Skip-PC, Ssym for APPNP with its α = 0.1. GIN and Graph SAGE are with fixed S. In the case of L = 1, the regularization parameter C is 0.001 for all GNNs except APPNP where C = 0.5. For L = 2, C = 0.001 for all, except GCN with C = 0.25 and GCN Skip-α with C = 0.25. For L = 4, again C = 0.001 for all, except GCN with C = 0.25 and GCN Skip-α with C = 0.5.