reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Detecting Latent Communities in Network Formation Models

Authors: Shujie Ma, Liangjun Su, Yichong Zhang

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The ﬁnite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets. In this section, we conduct some simulations to evaluate the performance of our procedure. In this section, we apply the proposed method to study the community structure of social network datasets.
Researcher Affiliation	Academia	Shujie Ma EMAIL Department of Statistics University of California Riverside, CA 92521, USA Liangjun Su EMAIL School of Economics and Management Tsinghua University Beijing, 100084, China Yichong Zhang EMAIL School of Economics Singapore Management University 178903, Singapore
Pseudocode	No	The paper describes a multi-step estimation procedure in Section 3 using numbered bullet points to detail the steps. While it outlines a method, it is presented as a narrative description rather than structured pseudocode or a formally labeled algorithm block.
Open Source Code	No	The text does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described in this paper.
Open Datasets	Yes	Pokec is a popular on-line social network in Slovakia. The whole dataset has more than 1.6 million users, and it can be downloaded from https://snap.stanford.edu/data/soc-Pokec.html. The dataset contains Facebook friendship networks at one hundred American colleges and universities at a single point in time. It was provided and analyzed by Traud et al. (2012), and can be downloaded from https://archive.org/details/oxford-2005-facebook-matrix.
Dataset Splits	No	The paper describes data generation mechanisms for simulations and initial data cleaning for empirical applications (e.g., 'select the first 10000 users', 'After deleting the nodes with missing values'). However, it does not specify explicit training, validation, or test splits for evaluating the model on these datasets, nor does it refer to predefined splits with citations for this purpose.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments or simulations.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. Although it mentions 'Python code by Graham (2017)', this refers to a third-party tool and does not specify version information for any software used in the authors' implementation.
Experiment Setup	Yes	We select the number of communities K1 by an eigenvalue ratio method given as follows. Let bσ1,1 bσKmax,1 be the ﬁrst Kmax singular values of the SVD decomposition of bΘ1 from the nuclear norm penalization method given in Section 3.1.1. We estimate K1 by b K1 deﬁned in (16) by setting c1 = 0.1 and Kmax = 10. We set the tuning parameter λn = Cλ{ n Y + log n}/{n(n 1)} with Cλ = 2 and similarly for λ(1) n . To require that the estimator of bΘl,ij is bounded by ﬁnite constants, we let M = 2 and CM = 2. The performance of the method is not sensitive to the choice of these ﬁnite constants. Deﬁne the mean squared error (MSE) of the nuclear norm estimator bΘl for Θl as P i =j(bΘl,ij Θ l,ij)2/{n(n 1)} for l = 0, 1. Table 1 reports the MSEs for bΘl, the mean of b K1 and the percentage of correctly estimating K1 based on the 200 realizations.