reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Heterogeneous Graph Neural Network on Semantic Tree

Authors: Mingyu Guan, Jack W Stokes, Qinlong Luo, Fuchen Liu, Purvanshi Mehta, Elnaz Nouri, Taesoo Kim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation of HETTREE on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks and efficiently scales to large real-world graphs with millions of nodes and edges. We conduct extensive experiments on six heterogeneous graphs to answer the following questions. Q1. How does HETTREE compare to the state-of-the-art overall on open benchmarks? Q2. How does HETTREE perform in a practical compromised account detection on a noisy email graph? Q3. How does each component of HETTREE contribute to the performance gain? Q4. Is HETTREE practical w.r.t. running time and memory usage?
Researcher Affiliation	Collaboration	Mingyu Guan1, Jack W. Stokes2, Qinlong Luo2, Fuchen Liu2, Purvanshi Mehta3, Elnaz Nouri2, Taesoo Kim1 1Georgia Institute of Technology, 2Microsoft Corporation, 3Lica World Inc
Pseudocode	No	The paper describes its methodology in Section 4, 'Methodology', using textual descriptions and mathematical formulas, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code https://github.com/microsoft/Het Tree
Open Datasets	Yes	We evaluate HETTREE on four graphs from the HGB (Lv et al. 2021) benchmark: DBLP, IMDB, ACM, and Freebase, a citation graph Ogbn-Mag from the OGB benchmark (Hu et al. 2020a) and a real-world email dataset collected from a commercial email platform.
Dataset Splits	No	The paper mentions using datasets from the HGB and OGB benchmarks, which typically define their own splits, and also discusses a 'training set' and 'non-training set' for label aggregation. However, explicit percentages or sample counts for training, validation, and test splits are not provided within the paper's main text for any of the datasets used.
Hardware Specification	Yes	All of the experiments were conducted on a machine with dual 12-core Intel Xeon Gold 6226 CPU, 384 GB of RAM, and one NVIDIA Tesla A100 80GB GPU.
Software Dependencies	Yes	The server runs 64-bit Red Hat Enterprise Linux 7.6 with CUDA library v11.8, Py Torch v1.12.0, and DGL v0.9.
Experiment Setup	No	The paper describes the model architecture and general setup, such as using a 'mean aggregator' and '2-hop feature propagation' for fair comparison with baselines, and states that 'All experimental results reported are averaged over five random seeds.' However, specific hyperparameters like learning rate, batch size, number of epochs, or optimizer settings are not explicitly detailed in the main text.