reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning

Authors: Linfeng Liu, XU HAN, Dawei Zhou, Liping Liu

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed model establishes new state-of-the-art results across seven benchmark datasets. Extensive analysis of the model indicates that the proposed model can reasonably prune the target graph for SED computation. The implementation of our algorithm is released at our Github repo: https://github.com/ tufts-ml/Prune4SED.
Researcher Affiliation	Collaboration	Linfeng Liu EMAIL Meta, Boston Xu Han EMAIL Department of Computer Science Tufts University Dawei Zhou EMAIL Department of Computer Science Virginia Tech Li-Ping Liu EMAIL Department of Computer Science Tufts University
Pseudocode	Yes	Algorithm 1 Prune4SED Input: Gq, Gt 1: Query-aware representation learning 2: for l = 1 to L do 3: e Hl t, Hl q = ( EMB (Xt) , EMB (Xq) , if l = 1 GAT Hl 1 t , Et , GAT Hl 1 q , Eq , o.w. 4: Hl t = QAL( e Hl t, Hl q) QAL block from equation 10 5: end for 6: Ht = MLP CONCAT H1 t . . . , HL t 7: for m = 1 to M do Multi-head pruning 8: G t,m = PRUNEm (Ht, Gt) 9: ˆym = PRED(G t,m, Gq) 10: end for 11: ˆy = MEAN (ˆy1, . . . , ˆy M) 12: Return ˆy
Open Source Code	Yes	The implementation of our algorithm is released at our Github repo: https://github.com/ tufts-ml/Prune4SED.
Open Datasets	Yes	We use seven datasets (AIDS, Cite Seer, Cora_ML, Amazon, DBLP, Pub Med, and Protein) to evaluate our model for SED approximation. B.1 provides more descriptions about these datasets. Query graphs of the AIDS dataset are known functional groups from Ranu & Singh (2009). https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data
Dataset Splits	Yes	From each dataset, we randomly pair target and query graphs to get 100K pairs for training, 10K for validation, and another 10K for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It mentions "Part of the computation resource is provided by Amazon AWS" and "The solver runs on a 64 core machine", but these are too general or refer to the solver environment, not their specific experimental setup.
Software Dependencies	No	The paper mentions software components like "GATv2Conv (Brody et al., 2021)", "MIP-F2 solver (Lerouge et al., 2017)", and "GEDLIB (Blumenthal et al., 2019; 2020)", but it does not specify exact version numbers for these or other key software components used in their experiments.
Experiment Setup	Yes	By default, we use L = 5 stages for Prune4SED. In hard pruning, we take top k (k = 5) important nodes and their h (h = L 1 = 4) hop neighbors. The SED predictor contains an 8-layer GIN with 64 hidden units at every layer. We use M = 5 heads to produce the final prediction. More details about model hyperparameters and platforms are given in B.2.