GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks

Authors: Dingyi Zhuang, Chonghe Jiang, Yunhan Zheng, Shenhao Wang, Jinhua Zhao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method outperforms stateof-the-art calibration techniques, reducing expected calibration error (ECE) by 25% across 10 GNN benchmark datasets. Additionally, GETS is computationally efficient, scalable, and capable of selecting effective input combinations for improved calibration performance. The implementation is available at https://github.com/Zhuang Dingyi/GETS/. 5 EXPERIMENTS 5.1 EXPERIMENTAL SETUP 5.2 CONFIDENCE CALIBRATION EVALUATION 5.3 TIME COMPLEXITY 5.4 EXPERT SELECTION 5.5 ABLATION STUDIES
Researcher Affiliation Academia Dingyi Zhuang Massachusetts Institute of Technology EMAIL Chonghe Jiang The Chinese University of Hong Kong EMAIL Yunhan Zheng Singapore-MIT Alliance for Research and Technology (SMART) EMAIL Shenhao Wang University of Florida EMAIL Jinhua Zhao Massachusetts Institute of Technology EMAIL
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled pseudocode blocks or algorithms in a structured format.
Open Source Code Yes The implementation is available at https://github.com/Zhuang Dingyi/GETS/.
Open Datasets Yes We include the 10 commonly used graph classification networks for a thorough evaluation, The data summary is given in Table 1, refer to Appendix A.2 for their sources. ... We evaluated our method on several widely used benchmark datasets, all accessible via the Deep Graph Library (DGL)1. These datasets encompass a variety of graph types and complexities, allowing us to assess the robustness and generalizability of our calibration approach. ... Citation Networks (Cora, Citeseer, Pubmed, Cora-Full): In these datasets (Sen et al., 2008; Mc Callum et al., 2000; Giles et al., 1998)
Dataset Splits Yes The train-val-test split is 20-10-70 (Hsu et al., 2022; Tang et al., 2024), note that uncertainty calibration models are trained on the validation set, which is also referred to as the calibration set. We randomly generate 10 different splits of training, validation, and testing inputs and run the models 10 times on different splits.
Hardware Specification Yes All our experiments are implemented on a machine with Ubuntu 22.04, with 2 AMD EPYC 9754 128-Core Processors, 1TB RAM, and 10 NVIDIA L40S 48GB GPUs.
Software Dependencies No The paper mentions using 'torch.nn.Embedding' and 'Deep Graph Library (DGL)', but does not specify version numbers for these or other software components like Python, PyTorch, or CUDA.
Experiment Setup Yes For the base GNN classification model (i.e., the uncalibrated model), we follow the architecture and parameter setup outlined by Kipf & Welling (2016); Veliˇckovi c et al. (2017); Xu et al. (2018), with modifications to achieve optimal performance. Specifically, we use a two-layer GCN, GAT, or GIN model and tune the hidden dimension from the set {16, 32, 64}. We experiment with dropout rates ranging from 0.5 to 1, and we do not apply any additional normalization. During training, we use a learning rate of 1e-2. We tune the weight decay parameter to prevent overfitting and consider adding early stopping with patience of 50 epochs. The model is trained for a maximum of 200 epochs to ensure convergence. The specifics are summarized in Table 4. ... For all experiments, the pre-trained GNN classifiers are frozen, and the predicted logits z from the validation set are fed into our calibration model as inputs. ... Table 5: Summary of GETS Parameters Across Datasets (Hidden Dim, Dropout, Num Layers, Learning Rate, Weight Decay are specified for each dataset).