Detecting Latent Communities in Network Formation Models

Authors: Shujie Ma, Liangjun Su, Yichong Zhang

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets. In this section, we conduct some simulations to evaluate the performance of our procedure. In this section, we apply the proposed method to study the community structure of social network datasets.
Researcher Affiliation Academia Shujie Ma EMAIL Department of Statistics University of California Riverside, CA 92521, USA Liangjun Su EMAIL School of Economics and Management Tsinghua University Beijing, 100084, China Yichong Zhang EMAIL School of Economics Singapore Management University 178903, Singapore
Pseudocode No The paper describes a multi-step estimation procedure in Section 3 using numbered bullet points to detail the steps. While it outlines a method, it is presented as a narrative description rather than structured pseudocode or a formally labeled algorithm block.
Open Source Code No The text does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described in this paper.
Open Datasets Yes Pokec is a popular on-line social network in Slovakia. The whole dataset has more than 1.6 million users, and it can be downloaded from https://snap.stanford.edu/data/soc-Pokec.html. The dataset contains Facebook friendship networks at one hundred American colleges and universities at a single point in time. It was provided and analyzed by Traud et al. (2012), and can be downloaded from https://archive.org/details/oxford-2005-facebook-matrix.
Dataset Splits No The paper describes data generation mechanisms for simulations and initial data cleaning for empirical applications (e.g., 'select the first 10000 users', 'After deleting the nodes with missing values'). However, it does not specify explicit training, validation, or test splits for evaluating the model on these datasets, nor does it refer to predefined splits with citations for this purpose.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments or simulations.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. Although it mentions 'Python code by Graham (2017)', this refers to a third-party tool and does not specify version information for any software used in the authors' implementation.
Experiment Setup Yes We select the number of communities K1 by an eigenvalue ratio method given as follows. Let bσ1,1 bσKmax,1 be the first Kmax singular values of the SVD decomposition of bΘ1 from the nuclear norm penalization method given in Section 3.1.1. We estimate K1 by b K1 defined in (16) by setting c1 = 0.1 and Kmax = 10. We set the tuning parameter λn = Cλ{ n Y + log n}/{n(n 1)} with Cλ = 2 and similarly for λ(1) n . To require that the estimator of bΘl,ij is bounded by finite constants, we let M = 2 and CM = 2. The performance of the method is not sensitive to the choice of these finite constants. Define the mean squared error (MSE) of the nuclear norm estimator bΘl for Θl as P i =j(bΘl,ij Θ l,ij)2/{n(n 1)} for l = 0, 1. Table 1 reports the MSEs for bΘl, the mean of b K1 and the percentage of correctly estimating K1 based on the 200 realizations.