Nonparametric Network Models for Link Prediction
Authors: Sinead A. Williamson
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experimental Evaluation We begin by demonstrating the ability of the MDND to recover latent network structure in a two synthetically-generated data sets, before performing a quantitative analysis on three real-world networks. ... Table 2 shows the log predictive likelihoods obtained on the above data sets. |
| Researcher Affiliation | Academia | Sinead A. Williamson EMAIL Department of Statistics and Data Science/ Department of Information, Risk and Operations Management, University of Austin at Texas |
| Pseudocode | No | The paper describes the inference method as constructing a collapsed Gibbs sampler by modifying an existing direct assignment sampler. It provides mathematical equations for predictive distributions and mentions techniques like split/merge moves and Restricted Gibbs method, but does not present these steps in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions using 'existing C code released by the authors of Kemp et al. (2006)' and 'an existing R package (Chang, 2012)' for comparison models. However, it does not explicitly state that the code for the methodology described in this paper (Mixture of Dirichlet Network Distributions) is open-source, available, or provide any links. |
| Open Datasets | Yes | The paper uses the 'Militarized disputes' dataset, citing '(Maoz, 2005)', and the 'ENRON email network', citing '(Klimmt and Yang, 2004)'. These are formal citations to published sources for the datasets. |
| Dataset Splits | Yes | For Macbeth: 'we split the play into 5 contiguous subsets, each containing (approximately) N/5 consecutive edges. We used these subsets to generate a 5-fold split into training data (4 of the 5 subsets) and a test set (the remaining one subset)'. For Militarized disputes: 'We split this data set into 10 subsets, and used these subsets to generate 10 train/test splits, where 9 subsets were used for training and 1 for evaluation.' For ENRON: 'We looked at five training sets, corresponding to the total set of emails sent and received in each of the first 5 months of 2000. For each data set, we evaluated predictive performance on the first 1000 emails sent in the subsequent month.' |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments. |
| Software Dependencies | Yes | The paper mentions using 'an existing R package (Chang, 2012)' for the Mixed-membership stochastic blockmodel, and its corresponding reference specifies 'R package version 1.3.2' for the 'lda' package. |
| Experiment Setup | Yes | For synthetic data, specific parameters are given: 'Dirichlet(5, 5, 5, 5, 5, 5) distribution', 'Gamma(5, 1)', and 'Gamma(0.5, 1)'. For the Symmetric DND, 'Dirichlet process concentration parameter τ = 1' is specified. For MMSB, 'K = 50 clusters' is used. For MDND evaluation, '100 samples of the test cluster assignments' are generated using the Gibbs sampler. |