reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Modelling Populations of Interaction Networks via Distance Metrics

Authors: George Bolt, Simón Lunagómez, Christopher Nemeth

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through simulation studies, we demonstrate the robustness and eﬃciency of our approach, and we showcase its applicability with a case study on a location-based social network (LSBN) dataset.
Researcher Affiliation	Academia	George Bolt EMAIL STOR-i Centre for Doctoral Training Lancaster University, Lancaster, UK, LA1 4YF Sim on Lunag omez EMAIL Department of Statistics Instituto Tecnol ogico Aut onomo de M exico (ITAM) R ıo Hondo 1, Altavista, Alvaro Obreg on, 01080 Ciudad de M exico, CDMX, Mexico Christopher Nemeth EMAIL School of Mathematical Sciences Lancaster University, Lancaster, UK, LA1 4YF
Pseudocode	Yes	Algorithm 1: Involutive exchange (i Exchange) algorithm
Open Source Code	No	The paper includes a license for the content of the paper (CC-BY 4.0) and attribution requirements, but does not explicitly state that the source code for the methodology described in the paper is openly available or provide a link to a code repository.
Open Datasets	Yes	As a motivating example, consider the Foursquare check-in dataset of Yang et al. (2015).
Dataset Splits	No	In the data analysis section (Section 7), for the Foursquare dataset, the paper states: 'This left a total of 402 observations, from which we extracted a subset of 50 to analyse'. However, it does not specify how this subset of 50 observations was further split into training, validation, or test sets for the analysis described. While simulation studies (Section 6.3) mention 'sampled training and testing data {S(i)}n+ntest i=1', these are for synthetic data and not specific, fixed splits for the Foursquare dataset analysis.
Hardware Specification	Yes	The total run time for obtaining these posterior samples was approximately 18 hours, corresponding to an average of around 0.65 seconds per sample. This was implemented on a Dell Latitude 5440 laptop, with a 13th Gen Intel Core i7-1370P processor and 64 GB of RAM.
Software Dependencies	No	The paper does not provide specific version numbers for any software components or libraries used in the implementation of the methodology or experiments.
Experiment Setup	Yes	For our priors, we assumed Em SIM( ˆE, 3.0), with ˆE denoting the sample Fr echet mean of the observed data {E(i)}n i=1, whilst we assumed γ Gamma(5, 1.67). Via our MCMC scheme, we then obtained a sample {(Em i , γi)}M i=1, from the posterior p(Em, γ\|{E(i)}n i=1), obtaining a total of 100,000 samples. In each iteration, when sampling the 50 auxiliary data points, we took a lag of 50 between and discarded the ﬁrst 4,000 as burn-in. From the 100,000 posterior samples, we discarded the ﬁrst half as burn-in, and took a lag of 50 between samples, leaving a ﬁnal M = 1000 samples.