Exact Certification of (Graph) Neural Networks Against Label Poisoning
Authors: Mahalakshmi Sabanayagam, Lukas Gosch, Stephan Günnemann, Debarghya Ghoshdastidar
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Sec. 4.1 we thoroughly investigate our sample-wise and collective certificates. Sec. 4.2 discusses in detail the effect of architectural choices and graph structure. Datasets. We use the real-world graph datasets Cora-ML (Bojchevski & G unnemann, 2018) and Citeseer (Giles et al., 1998) for multi-class certification. We evaluate binary class certification using Polblogs (Adamic & Glance, 2005), and by extracting the subgraphs containing the top two largest classes from Cora-ML, Citeseer, Wiki-CS (Mernyei & Cangea, 2020), Cora (Mc Callum et al., 2000) and Chameleon (Rozemberczki et al., 2021) referring to these as Cora-MLb, Citeseerb, Wiki-CSb, Corab and Chameleonb, respectively. |
| Researcher Affiliation | Academia | 1 School of Computation, Information and Technology, Technical University of Munich 2 Munich Data Science Institute 3 Munich Center for Machine Learning (MCML); Germany EMAIL |
| Pseudocode | No | The paper describes methods and theorems but does not contain explicitly labeled pseudocode or algorithm blocks. The derivations are mathematical and descriptions are in prose. |
| Open Source Code | Yes | The code is available at https://github.com/saper0/qpcert. |
| Open Datasets | Yes | We use the real-world graph datasets Cora-ML (Bojchevski & G unnemann, 2018) and Citeseer (Giles et al., 1998) for multi-class certification. We evaluate binary class certification using Polblogs (Adamic & Glance, 2005), and by extracting the subgraphs containing the top two largest classes from Cora-ML, Citeseer, Wiki-CS (Mernyei & Cangea, 2020), Cora (Mc Callum et al., 2000) and Chameleon (Rozemberczki et al., 2021) referring to these as Cora-MLb, Citeseerb, Wiki-CSb, Corab and Chameleonb, respectively. |
| Dataset Splits | Yes | We choose 10 nodes per class for training for all datasets, except for Citeseer, for which we choose 20. No separate validation set is needed as we perform 4-fold cross-validation (CV) for hyperparameter tuning. All results are averaged over 5 seeds (multiclass datasets: 3 seeds) and reported with their standard deviation. The test set for collective certificates consists of all unlabeled nodes on CSBM and CBA, and random samples of 50 unlabeled nodes for real-world graphs. The samplewise certificate is calculated on all unlabeled nodes. |
| Hardware Specification | No | We used Gurobi to solve the MILP problems and all our experiments are run on CPU on an internal cluster. The memory requirement to compute sample-wise and collective certificates depends on the length MILP solving process. |
| Software Dependencies | Yes | All results concern the infinite-width limit and are obtained by solving the MILPs in Thm. 1 and 2 using Gurobi 11.0.1 (Gurobi Optimization, LLC, 2023) and the GNN s NTK as derived in Gosch et al. (2024) and Sabanayagam et al. (2023). |
| Experiment Setup | Yes | All other hyperparameters are chosen based on 4-fold CV, given in App. G.2. We define the row and symmetric normalizations as Srow = b D 1 b A, Ssym = b D 1/2 b A b D 1/2 with b D and b A the degree and adjacency matrices of the given graph G with an added self-loop. For CSBM, we choose S to Srow for GCN, SGC, GCN Skip-α and GCN Skip-PC, Ssym for APPNP with its α = 0.1. GIN and Graph SAGE are with fixed S. In the case of L = 1, the regularization parameter C is 0.001 for all GNNs except APPNP where C = 0.5. For L = 2, C = 0.001 for all, except GCN with C = 0.25 and GCN Skip-α with C = 0.25. For L = 4, again C = 0.001 for all, except GCN with C = 0.25 and GCN Skip-α with C = 0.5. |