Nonparametric Network Models for Link Prediction

Authors: Sinead A. Williamson

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experimental Evaluation We begin by demonstrating the ability of the MDND to recover latent network structure in a two synthetically-generated data sets, before performing a quantitative analysis on three real-world networks. ... Table 2 shows the log predictive likelihoods obtained on the above data sets.
Researcher Affiliation Academia Sinead A. Williamson EMAIL Department of Statistics and Data Science/ Department of Information, Risk and Operations Management, University of Austin at Texas
Pseudocode No The paper describes the inference method as constructing a collapsed Gibbs sampler by modifying an existing direct assignment sampler. It provides mathematical equations for predictive distributions and mentions techniques like split/merge moves and Restricted Gibbs method, but does not present these steps in a structured pseudocode or algorithm block.
Open Source Code No The paper mentions using 'existing C code released by the authors of Kemp et al. (2006)' and 'an existing R package (Chang, 2012)' for comparison models. However, it does not explicitly state that the code for the methodology described in this paper (Mixture of Dirichlet Network Distributions) is open-source, available, or provide any links.
Open Datasets Yes The paper uses the 'Militarized disputes' dataset, citing '(Maoz, 2005)', and the 'ENRON email network', citing '(Klimmt and Yang, 2004)'. These are formal citations to published sources for the datasets.
Dataset Splits Yes For Macbeth: 'we split the play into 5 contiguous subsets, each containing (approximately) N/5 consecutive edges. We used these subsets to generate a 5-fold split into training data (4 of the 5 subsets) and a test set (the remaining one subset)'. For Militarized disputes: 'We split this data set into 10 subsets, and used these subsets to generate 10 train/test splits, where 9 subsets were used for training and 1 for evaluation.' For ENRON: 'We looked at five training sets, corresponding to the total set of emails sent and received in each of the first 5 months of 2000. For each data set, we evaluated predictive performance on the first 1000 emails sent in the subsequent month.'
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies Yes The paper mentions using 'an existing R package (Chang, 2012)' for the Mixed-membership stochastic blockmodel, and its corresponding reference specifies 'R package version 1.3.2' for the 'lda' package.
Experiment Setup Yes For synthetic data, specific parameters are given: 'Dirichlet(5, 5, 5, 5, 5, 5) distribution', 'Gamma(5, 1)', and 'Gamma(0.5, 1)'. For the Symmetric DND, 'Dirichlet process concentration parameter τ = 1' is specified. For MMSB, 'K = 50 clusters' is used. For MDND evaluation, '100 samples of the test cluster assignments' are generated using the Gibbs sampler.