Heterogeneous Graph Neural Network on Semantic Tree
Authors: Mingyu Guan, Jack W Stokes, Qinlong Luo, Fuchen Liu, Purvanshi Mehta, Elnaz Nouri, Taesoo Kim
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation of HETTREE on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks and efficiently scales to large real-world graphs with millions of nodes and edges. We conduct extensive experiments on six heterogeneous graphs to answer the following questions. Q1. How does HETTREE compare to the state-of-the-art overall on open benchmarks? Q2. How does HETTREE perform in a practical compromised account detection on a noisy email graph? Q3. How does each component of HETTREE contribute to the performance gain? Q4. Is HETTREE practical w.r.t. running time and memory usage? |
| Researcher Affiliation | Collaboration | Mingyu Guan1, Jack W. Stokes2, Qinlong Luo2, Fuchen Liu2, Purvanshi Mehta3, Elnaz Nouri2, Taesoo Kim1 1Georgia Institute of Technology, 2Microsoft Corporation, 3Lica World Inc |
| Pseudocode | No | The paper describes its methodology in Section 4, 'Methodology', using textual descriptions and mathematical formulas, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code https://github.com/microsoft/Het Tree |
| Open Datasets | Yes | We evaluate HETTREE on four graphs from the HGB (Lv et al. 2021) benchmark: DBLP, IMDB, ACM, and Freebase, a citation graph Ogbn-Mag from the OGB benchmark (Hu et al. 2020a) and a real-world email dataset collected from a commercial email platform. |
| Dataset Splits | No | The paper mentions using datasets from the HGB and OGB benchmarks, which typically define their own splits, and also discusses a 'training set' and 'non-training set' for label aggregation. However, explicit percentages or sample counts for training, validation, and test splits are not provided within the paper's main text for any of the datasets used. |
| Hardware Specification | Yes | All of the experiments were conducted on a machine with dual 12-core Intel Xeon Gold 6226 CPU, 384 GB of RAM, and one NVIDIA Tesla A100 80GB GPU. |
| Software Dependencies | Yes | The server runs 64-bit Red Hat Enterprise Linux 7.6 with CUDA library v11.8, Py Torch v1.12.0, and DGL v0.9. |
| Experiment Setup | No | The paper describes the model architecture and general setup, such as using a 'mean aggregator' and '2-hop feature propagation' for fair comparison with baselines, and states that 'All experimental results reported are averaged over five random seeds.' However, specific hyperparameters like learning rate, batch size, number of epochs, or optimizer settings are not explicitly detailed in the main text. |