Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking

Authors: Simon Zhang, Cheng Xin, Tamal K. Dey

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments 1 also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation. ... We show in Table 1 the comparison of PR-AUC scores amongst the baseline methods of HGNN, HGNNP, HNHN, Hyper GCN, Uni GIN, Uni GAT, Uni SAGE, their hyperedge dropped versions, and "Our" method, which preprocesses the hypergraph to break symmetry during training.
Researcher Affiliation Academia Simon Zhang EMAIL Department of Computer Science Purdue University Cheng Xin EMAIL Department of Computer Science Rutgers University Tamal K. Dey EMAIL Department of Computer Science Purdue University
Pseudocode Yes Algorithm 1: A Symmetry Finding Algorithm Data: Hypergraph H = (V, E), represented by its star expansion matrix H. L Z+ is the number of iterations to run GWL-1. Result: A pair of collections: (RV = {VRj}, RE = j{ERj}) where Rj are disconnected subhypergraphs exhibiting symmetry in H that are indistinguishable by L-GWL-1.
Open Source Code Yes 1https://github.com/simonzhang00/Hypergraph Symmetry Breaking
Open Datasets Yes We evaluate our method on higher order link prediction with many of the standard hypergraph neural network methods. Due to potential class imbalance, we measure the PR-AUC of higher order link prediction on the hypergraph datasets. These datasets are: cat-edge-DAWN, cat-edge-musicblues-reviews, contact-high-school, contact-primary-school, email-Eu, cat-edge-madison-restaurants. These datasets range from representing social interactions as they develop over time to collections of reviews to drug combinations before overdose. We also evaluate on the amherst41 dataset, which is a graph dataset. ... All datasets are originally from Benson et al. (2018b) or are general hypergraph datasets provided in Sinha et al. (2015); Amburg et al. (2020a).
Dataset Splits Yes Data Splitting: For the hypergraph datasets, each hyperedge in it is paired with a timestamp (a real number). These timestamps are a physical time for which a higher order interaction, represented by a hyperedge, occurs. We form a train-val-test split by letting the train be the hyperedges associated with the 80th percentile of timestamps, the validation be the hyperedges associated with the timestamps in between the 80th and 85th percentiles. The test hyperedges are the remaining hyperedges. ... For the graph dataset, the single graph is deterministically split into 80/5/15 for train/val/test.
Hardware Specification Yes We perform experiments on a cluster of machines equipped with AMD MI100s GPUs and 112 shared AMD EPYC 7453 28-Core Processors with 2.6 PB shared RAM.
Software Dependencies No The paper does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We implement h(S, H) from Equation 7 as follows. Upon extracting the node representations from the hypergraph neural network, we use a multi-layer-perceptron (MLP) on each node representation, sum across such compositions, then apply a final MLP layer after the aggregation. We use the binary cross entropy loss on this multi-node representation for training. We always use 5 layers of hyper GNN convolutions, a hidden dimension of 1024, and a learning rate of 0.01. ... To get a large set of symmetric subhypergraphs, we run 2 iterations of GWL-1. ... Exactly 2000 epochs are used for training.