reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features

Authors: Daeho Um, Yoonji Lee, Jiwoong Park, Seulki Park, Yuneil Yeo, Seong Jin Ahn

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we demonstrate that our virtual feature scheme effectively serves as a bridge between existing diffusion-based methods and heterogeneous graphs, maintaining the advantages of these methods. Furthermore, we confirm that adjusting the importance of each edge type leads to significant performance gains on heterogeneous graphs. Extensive experimental results demonstrate the superiority of our scheme in both semi-supervised node classification and link prediction tasks on heterogeneous graphs with missing rates ranging from low to exceedingly high.
Researcher Affiliation	Collaboration	Daeho Um AI Center, Samsung Electronics EMAIL, Yoonji Lee Samsung Electronics EMAIL, Jiwoong Park Department of Electrical and Computer Engineering Texas A&M University EMAIL, Seulki Park University of Michigan EMAIL, Yuneil Yeo Department of Civil and Environmental Engineering UC Berkeley EMAIL, Seong Jin Ahn KAIST EMAIL
Pseudocode	No	The paper describes the proposed method using mathematical formulas and descriptive text in sections 4.1, 4.2, 4.3, and 4.4, but does not include a distinct pseudocode or algorithm block.
Open Source Code	Yes	The source code is available at https://github.com/daehoum1/hetgfd.
Open Datasets	Yes	Data Setting. We conduct experiments on three widely used heterogeneous graph datasets (ACM, DBLP, and IMDB) (Jin et al., 2021) from different domains. Detailed descriptions of these datasets and their sources can be found in Appendix B.2. ... We downloaded all the datasets used in this paper from the Git Hub repository for Jin et al. (2021). ... In the protein-protein interaction networks (PPI) dataset (Zitnik & Leskovec, 2017)...
Dataset Splits	Yes	We utilize the node split suggested in Jin et al. (2021), which uses 10% nodes for training, 10% nodes for validation, and 80% nodes for testing. ... For the link prediction splits, as described in Kipf & Welling (2016b), we divide target edges into training, validation, and testing sets, comprising 10%, 5%, and 85% of the edges, respectively. ... We use 80% nodes for training, 10% nodes for validation, and 10% nodes for testing.
Hardware Specification	Yes	All experiments are conducted with an Intel Core I5-6600 CPU @ 3.30 GHz and a single GPU (NVIDIA Ge Force RTX 2080 Ti).
Software Dependencies	No	The paper mentions Pytorch and Pytorch Geometric as implementation frameworks and cites relevant papers for them, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We tune hyperparameters for training downstream GNN models and conduct a grid search based on the validation sets. Specifically, we search for the optimal number of layers from {1, 2, 3} and the learning rate from {0.1, 0.01, 0.001, 0.0001}. We set the the hidden dimension to 64 for all the models. ... The maximum number of epochs is set to 1000 and we apply an early stopping strategy with the patience of 200 epochs. ... To find the optimal hyperparameters α and β for Het GFD, we perform a grid search on validation sets. The search range is set to {(α, β)\|α {0.9, 0.7, 0.5, 0.3, 0.1}, β {0.99, 0.9, 0.8, 0.5, 0.4, 0.2, 0.1, 0.05}}. We set the value of K to 100.