Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning
Authors: Linfeng Liu, XU HAN, Dawei Zhou, Liping Liu
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed model establishes new state-of-the-art results across seven benchmark datasets. Extensive analysis of the model indicates that the proposed model can reasonably prune the target graph for SED computation. The implementation of our algorithm is released at our Github repo: https://github.com/ tufts-ml/Prune4SED. |
| Researcher Affiliation | Collaboration | Linfeng Liu EMAIL Meta, Boston Xu Han EMAIL Department of Computer Science Tufts University Dawei Zhou EMAIL Department of Computer Science Virginia Tech Li-Ping Liu EMAIL Department of Computer Science Tufts University |
| Pseudocode | Yes | Algorithm 1 Prune4SED Input: Gq, Gt 1: Query-aware representation learning 2: for l = 1 to L do 3: e Hl t, Hl q = ( EMB (Xt) , EMB (Xq) , if l = 1 GAT Hl 1 t , Et , GAT Hl 1 q , Eq , o.w. 4: Hl t = QAL( e Hl t, Hl q) QAL block from equation 10 5: end for 6: Ht = MLP CONCAT H1 t . . . , HL t 7: for m = 1 to M do Multi-head pruning 8: G t,m = PRUNEm (Ht, Gt) 9: ˆym = PRED(G t,m, Gq) 10: end for 11: ˆy = MEAN (ˆy1, . . . , ˆy M) 12: Return ˆy |
| Open Source Code | Yes | The implementation of our algorithm is released at our Github repo: https://github.com/ tufts-ml/Prune4SED. |
| Open Datasets | Yes | We use seven datasets (AIDS, Cite Seer, Cora_ML, Amazon, DBLP, Pub Med, and Protein) to evaluate our model for SED approximation. B.1 provides more descriptions about these datasets. Query graphs of the AIDS dataset are known functional groups from Ranu & Singh (2009). https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data |
| Dataset Splits | Yes | From each dataset, we randomly pair target and query graphs to get 100K pairs for training, 10K for validation, and another 10K for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It mentions "Part of the computation resource is provided by Amazon AWS" and "The solver runs on a 64 core machine", but these are too general or refer to the solver environment, not their specific experimental setup. |
| Software Dependencies | No | The paper mentions software components like "GATv2Conv (Brody et al., 2021)", "MIP-F2 solver (Lerouge et al., 2017)", and "GEDLIB (Blumenthal et al., 2019; 2020)", but it does not specify exact version numbers for these or other key software components used in their experiments. |
| Experiment Setup | Yes | By default, we use L = 5 stages for Prune4SED. In hard pruning, we take top k (k = 5) important nodes and their h (h = L 1 = 4) hop neighbors. The SED predictor contains an 8-layer GIN with 64 hidden units at every layer. We use M = 5 heads to produce the final prediction. More details about model hyperparameters and platforms are given in B.2. |