reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nonparametric Network Models for Link Prediction

Authors: Sinead A. Williamson

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experimental Evaluation We begin by demonstrating the ability of the MDND to recover latent network structure in a two synthetically-generated data sets, before performing a quantitative analysis on three real-world networks. ... Table 2 shows the log predictive likelihoods obtained on the above data sets.
Researcher Affiliation	Academia	Sinead A. Williamson EMAIL Department of Statistics and Data Science/ Department of Information, Risk and Operations Management, University of Austin at Texas
Pseudocode	No	The paper describes the inference method as constructing a collapsed Gibbs sampler by modifying an existing direct assignment sampler. It provides mathematical equations for predictive distributions and mentions techniques like split/merge moves and Restricted Gibbs method, but does not present these steps in a structured pseudocode or algorithm block.
Open Source Code	No	The paper mentions using 'existing C code released by the authors of Kemp et al. (2006)' and 'an existing R package (Chang, 2012)' for comparison models. However, it does not explicitly state that the code for the methodology described in this paper (Mixture of Dirichlet Network Distributions) is open-source, available, or provide any links.
Open Datasets	Yes	The paper uses the 'Militarized disputes' dataset, citing '(Maoz, 2005)', and the 'ENRON email network', citing '(Klimmt and Yang, 2004)'. These are formal citations to published sources for the datasets.
Dataset Splits	Yes	For Macbeth: 'we split the play into 5 contiguous subsets, each containing (approximately) N/5 consecutive edges. We used these subsets to generate a 5-fold split into training data (4 of the 5 subsets) and a test set (the remaining one subset)'. For Militarized disputes: 'We split this data set into 10 subsets, and used these subsets to generate 10 train/test splits, where 9 subsets were used for training and 1 for evaluation.' For ENRON: 'We looked at ﬁve training sets, corresponding to the total set of emails sent and received in each of the ﬁrst 5 months of 2000. For each data set, we evaluated predictive performance on the ﬁrst 1000 emails sent in the subsequent month.'
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies	Yes	The paper mentions using 'an existing R package (Chang, 2012)' for the Mixed-membership stochastic blockmodel, and its corresponding reference specifies 'R package version 1.3.2' for the 'lda' package.
Experiment Setup	Yes	For synthetic data, specific parameters are given: 'Dirichlet(5, 5, 5, 5, 5, 5) distribution', 'Gamma(5, 1)', and 'Gamma(0.5, 1)'. For the Symmetric DND, 'Dirichlet process concentration parameter τ = 1' is specified. For MMSB, 'K = 50 clusters' is used. For MDND evaluation, '100 samples of the test cluster assignments' are generated using the Gibbs sampler.