reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking

Authors: Simon Zhang, Cheng Xin, Tamal K. Dey

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments 1 also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation. ... We show in Table 1 the comparison of PR-AUC scores amongst the baseline methods of HGNN, HGNNP, HNHN, Hyper GCN, Uni GIN, Uni GAT, Uni SAGE, their hyperedge dropped versions, and "Our" method, which preprocesses the hypergraph to break symmetry during training.
Researcher Affiliation	Academia	Simon Zhang EMAIL Department of Computer Science Purdue University Cheng Xin EMAIL Department of Computer Science Rutgers University Tamal K. Dey EMAIL Department of Computer Science Purdue University
Pseudocode	Yes	Algorithm 1: A Symmetry Finding Algorithm Data: Hypergraph H = (V, E), represented by its star expansion matrix H. L Z+ is the number of iterations to run GWL-1. Result: A pair of collections: (RV = {VRj}, RE = j{ERj}) where Rj are disconnected subhypergraphs exhibiting symmetry in H that are indistinguishable by L-GWL-1.
Open Source Code	Yes	1https://github.com/simonzhang00/Hypergraph Symmetry Breaking
Open Datasets	Yes	We evaluate our method on higher order link prediction with many of the standard hypergraph neural network methods. Due to potential class imbalance, we measure the PR-AUC of higher order link prediction on the hypergraph datasets. These datasets are: cat-edge-DAWN, cat-edge-musicblues-reviews, contact-high-school, contact-primary-school, email-Eu, cat-edge-madison-restaurants. These datasets range from representing social interactions as they develop over time to collections of reviews to drug combinations before overdose. We also evaluate on the amherst41 dataset, which is a graph dataset. ... All datasets are originally from Benson et al. (2018b) or are general hypergraph datasets provided in Sinha et al. (2015); Amburg et al. (2020a).
Dataset Splits	Yes	Data Splitting: For the hypergraph datasets, each hyperedge in it is paired with a timestamp (a real number). These timestamps are a physical time for which a higher order interaction, represented by a hyperedge, occurs. We form a train-val-test split by letting the train be the hyperedges associated with the 80th percentile of timestamps, the validation be the hyperedges associated with the timestamps in between the 80th and 85th percentiles. The test hyperedges are the remaining hyperedges. ... For the graph dataset, the single graph is deterministically split into 80/5/15 for train/val/test.
Hardware Specification	Yes	We perform experiments on a cluster of machines equipped with AMD MI100s GPUs and 112 shared AMD EPYC 7453 28-Core Processors with 2.6 PB shared RAM.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We implement h(S, H) from Equation 7 as follows. Upon extracting the node representations from the hypergraph neural network, we use a multi-layer-perceptron (MLP) on each node representation, sum across such compositions, then apply a final MLP layer after the aggregation. We use the binary cross entropy loss on this multi-node representation for training. We always use 5 layers of hyper GNN convolutions, a hidden dimension of 1024, and a learning rate of 0.01. ... To get a large set of symmetric subhypergraphs, we run 2 iterations of GWL-1. ... Exactly 2000 epochs are used for training.