reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Bridging Generalization and Expressivity of Graph Neural Networks

Authors: Shouheng Li, Floris Geerts, Dongwoo Kim, Qing Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through case studies and experiments on real-world datasets, we demonstrate that our theoretical findings align with empirical results, offering a deeper understanding of how expressivity can enhance GNN generalization. 7 EXPERIMENTS Tasks and Datasets We conduct graph classification experiments on six widely used benchmark datasets: ENZYMES, PROTEINS, and MUTAG from the TU dataset collection (Morris et al., 2020a), as well as SIDER and BACE from the molecular dataset collection (Wu et al., 2017).
Researcher Affiliation	Academia	1 School of Computing, Australian National University, Australia 2 Department of Computer Science, University of Antwerp, Belgium 3 CSE & GSAI, POSTECH, South Korea 4 Data61, CSIRO, Australia
Pseudocode	Yes	Algorithm 1: An algorithm to compute the bound in Theorem D.2
Open Source Code	Yes	The code implementation is available at https://github.com/seanli3/hom_gen.
Open Datasets	Yes	We conduct graph classification experiments on six widely used benchmark datasets: ENZYMES, PROTEINS, and MUTAG from the TU dataset collection (Morris et al., 2020a), as well as SIDER and BACE from the molecular dataset collection (Wu et al., 2017).
Dataset Splits	Yes	Each dataset is randomly divided into training and test sets following a 90%/10% split.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, only mentioning general experimental setup without specific GPU/CPU models or processor types.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It mentions the use of a softmax function, but not specific libraries or frameworks with their versions.
Experiment Setup	Yes	Each classification task is trained for 400 epochs, with five independent runs to report the mean and standard deviation of the results. Consistent with the setup in Tang and Liu (2023); Morris et al. (2023); Cong et al. (2021), we eliminate the use of regularization techniques such as dropout and weight decay. A batch size of 128 is utilized, with a learning rate set to 10 3, and the hidden layer dimension fixed at 64. The margin loss function is employed with a margin parameter γ = 1. [...] For all experiments, we set the confidence level δ to 0.1, yielding bounds with high probability.