reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-Shot Generalization of GNNs over Distinct Attribute Domains

Authors: Yangyi Shen, Jincheng Zhou, Beatrice Bevilacqua, Joshua Robinson, Charilaos Kanatsoulis, Jure Leskovec, Bruno Ribeiro

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, STAGE demonstrates strong zero-shot performance on medium-sized datasets: when trained on multiple graph datasets with different attribute spaces (varying in types and number) and evaluated on graphs with entirely new attributes, STAGE achieves a relative improvement in Hits@1 between 40% to 103% in link prediction and a 10% improvement in node classification compared to state-of-the-art baselines.
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University, Stanford, USA 2Department of Computer Science, Purdue University, West Lafayette, USA. Correspondence to: Yangyi Shen <EMAIL>.
Pseudocode	Yes	A. Pseudocode of STAGE Algorithm In this section, we present the detailed pseudocode for STAGE s two main components: (1) the STAGE-edge-graphs construction algorithm (Algorithm 1) that captures statistical dependencies between attributes, and (2) the forward pass (Algorithm 2) that uses these STAGE-edge-graphs to generate the final graph representation. The STAGE-edge-graphs construction creates a complete graph for each edge in the input graph, where nodes represent attributes and edge weights capture conditional probabilities between attribute pairs. The algorithm handles both totally ordered and unordered attributes. The forward pass then processes these STAGE-edge-graphs using two GNNs one to generate edge embeddings from the STAGE-edge-graphs, and another to produce the final graph representation using these embeddings.
Open Source Code	Yes	Our code is available at https://github.com/snap-stanford/stage-gnn/.
Open Datasets	Yes	E-Commerce Stores dataset (link prediction). We use data from a multi-category store (Kechinov, 2020) containing customer-product interactions (purchases, cart additions, views) over time. H&M dataset (link prediction). We use the H&M Personalized Fashion Recommendations dataset (Kaggle, 2021), which contains transactions from a large fashion retailer... Social network datasets (node classification): Friendster and Pokec. We evaluate STAGE on two online social networks from different regions and user bases: Friendster (Teixeira et al., 2019) and Pokec (SNAP, 2012).
Dataset Splits	Yes	We evaluate the performance of all methods on zero-shot generalization on the E-Commerce Stores dataset, training on four categories, and testing on the held-out fifth category. To simulate distinct single-category retailers, we partition the dataset into five domains, each representing a specialized store: shoes, refrigerators, desktops, smartphones, and beds. We first filtered out the nodes that contain invalid attributes and then sample the most popular 150 female and male nodes each before picking the largest connected components of the graph formed by the popular nodes.
Hardware Specification	Yes	Time is measured on an 80GB A100 GPU and averaged across 3 training epochs.
Software Dependencies	No	The paper mentions 'NBFNet-Py G' and 'GINEConv (Hu et al., 2020)' and 'GCN (Kipf & Welling, 2016)', but it does not specify version numbers for these software components or any other libraries used, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	For Figure 3, Table 1, and Figure 4 We use the default NBFNet-Py G configuration for the inductive WN18RR dataset (Zhu et al., 2021c), except for a few specific parameters. The input dimension for the node attribute is set to 256, and the model includes six hidden layers with dimensions [256, 256, 256, 256, 256, 256], making a total of seven layers. For STAGE, we use 1 layer of GINEConv (Hu et al., 2020) for the GNN on STAGE-edge-graph, which produces an edge representation of dimension 256. We also append an extra p value to each edge in the STAGE-edge-graph for expressivity. All model are trained with a batch size of 32 over 30 epochs. For Table 2, ... The input attribute dimension is set to 64, with 128 as the dimension of hidden channels. The model uses 2 layers of GINEConv (Hu et al., 2020). The learning rate for the optimizer was set to 0.0001, with a dropout rate of 0.5 to mitigate overfitting. Training was carried out for 400 epochs. Additionally, STAGE is deployed with 2 layers of GNN on STAGE-edge-graph with GINEConv and an edge representation of dimension 32.