reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fitting Autoregressive Graph Generative Models through Maximum Likelihood Estimation

Authors: Xu Han, Xiaohui Chen, Francisco J. R. Ruiz, Li-Ping Liu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically that ﬁtting autoregressive graph models via variational inference improves their qualitative and quantitative performance, and the improved model and inference network further boost the performance. We also conduct extensive experiments where we show the beneﬁts of the generative model and the approximate posterior over the approach of Chen et al. (2021). In Section 7, we analyze the empirical performance of graph generative models trained with VI.
Researcher Affiliation	Collaboration	Xu Han EMAIL Department of Computer Science Tufts University Medford, MA 02155, USA; Xiaohui Chen EMAIL Department of Computer Science Tufts University Medford, MA 02155, USA; Francisco J. R. Ruiz EMAIL Deep Mind 5 New Street, London, UK; Li-Ping Liu EMAIL Department of Computer Science Tufts University Medford, MA 02155, USA
Pseudocode	Yes	Algorithm 1 Autoregressive generation of adjacency matrices Algorithm 2 VI algorithm for training a graph model based on the adjacency matrix A
Open Source Code	Yes	The implementation of the proposed model is publicly available at https://github.com/tufts-ml/Graph-Generation-MLE.
Open Datasets	Yes	We use 8 datasets that are commonly used for benchmarking graph generative models: (1) Community-small: ... (2) Citeseer-small: ... (3) Enzymes: ... (4) Lung: ... (5) Yeast: ... (6) Cora: ... (7) SBM-assortative: ... (8) MMSBM: ... Graphs in Lung and Yeast datasets reprent structures of chemical compounds. Graphs in the Enzymes dataset represent protein tertiary structures. While there is only one single graph in the Citeseer or Cora datasets, we sample subgraphs via random walk to form the corresponding datasets.
Dataset Splits	Yes	We split all datasets into three parts: the train set (80%), validation set (10%), and test set (10%).
Hardware Specification	Yes	Both methods run on an RTX 3080 GPU.
Software Dependencies	Yes	For ROS-VI and Rout-VI, we use the Nauty package (Mc Kay and Piperno, 2013) to compute \|Π(A)\| (see Section 4).
Experiment Setup	Yes	We compare DAGG against three recent graph generative models: Graph DF (Luo et al., 2021), Graph RNN (You et al., 2018), and Graph GEN (Goyal et al., 2020). We use their original training methods with default hyperparameters. ... For each model, we use L = 1,000 samples to estimate the test log-likelihood via importance sampling (Eq. 13), using the variational distribution qφ(π \| G) as the proposal. ... In our experiments, we found that L = 1,000 gives an accurate estimation (see Figure 5).