Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking
Authors: Simon Zhang, Cheng Xin, Tamal K. Dey
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments 1 also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation. ... We show in Table 1 the comparison of PR-AUC scores amongst the baseline methods of HGNN, HGNNP, HNHN, Hyper GCN, Uni GIN, Uni GAT, Uni SAGE, their hyperedge dropped versions, and "Our" method, which preprocesses the hypergraph to break symmetry during training. |
| Researcher Affiliation | Academia | Simon Zhang EMAIL Department of Computer Science Purdue University Cheng Xin EMAIL Department of Computer Science Rutgers University Tamal K. Dey EMAIL Department of Computer Science Purdue University |
| Pseudocode | Yes | Algorithm 1: A Symmetry Finding Algorithm Data: Hypergraph H = (V, E), represented by its star expansion matrix H. L Z+ is the number of iterations to run GWL-1. Result: A pair of collections: (RV = {VRj}, RE = j{ERj}) where Rj are disconnected subhypergraphs exhibiting symmetry in H that are indistinguishable by L-GWL-1. |
| Open Source Code | Yes | 1https://github.com/simonzhang00/Hypergraph Symmetry Breaking |
| Open Datasets | Yes | We evaluate our method on higher order link prediction with many of the standard hypergraph neural network methods. Due to potential class imbalance, we measure the PR-AUC of higher order link prediction on the hypergraph datasets. These datasets are: cat-edge-DAWN, cat-edge-musicblues-reviews, contact-high-school, contact-primary-school, email-Eu, cat-edge-madison-restaurants. These datasets range from representing social interactions as they develop over time to collections of reviews to drug combinations before overdose. We also evaluate on the amherst41 dataset, which is a graph dataset. ... All datasets are originally from Benson et al. (2018b) or are general hypergraph datasets provided in Sinha et al. (2015); Amburg et al. (2020a). |
| Dataset Splits | Yes | Data Splitting: For the hypergraph datasets, each hyperedge in it is paired with a timestamp (a real number). These timestamps are a physical time for which a higher order interaction, represented by a hyperedge, occurs. We form a train-val-test split by letting the train be the hyperedges associated with the 80th percentile of timestamps, the validation be the hyperedges associated with the timestamps in between the 80th and 85th percentiles. The test hyperedges are the remaining hyperedges. ... For the graph dataset, the single graph is deterministically split into 80/5/15 for train/val/test. |
| Hardware Specification | Yes | We perform experiments on a cluster of machines equipped with AMD MI100s GPUs and 112 shared AMD EPYC 7453 28-Core Processors with 2.6 PB shared RAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We implement h(S, H) from Equation 7 as follows. Upon extracting the node representations from the hypergraph neural network, we use a multi-layer-perceptron (MLP) on each node representation, sum across such compositions, then apply a final MLP layer after the aggregation. We use the binary cross entropy loss on this multi-node representation for training. We always use 5 layers of hyper GNN convolutions, a hidden dimension of 1024, and a learning rate of 0.01. ... To get a large set of symmetric subhypergraphs, we run 2 iterations of GWL-1. ... Exactly 2000 epochs are used for training. |