Community models for networks observed through edge nominations
Authors: Tianxi Li, Elizaveta Levina, Ji Zhu
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed model in simulation studies on unweighted and weighted networks and under misspecified models. The method is applied to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools. |
| Researcher Affiliation | Academia | Tianxi Li EMAIL Department of Statistics University of Virginia Charlottesville, VA 22904, USA; Elizaveta Levina EMAIL; Ji Zhu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA |
| Pseudocode | Yes | Algorithm 1 (Right SC) Given an adjacency matrix A and the number of communities K: 1. Compute the rank K truncated SVD A, given by A = b U b D b V T . 2. Run the K-means clustering algorithm on rows of b V to assign each node to a community. Algorithm 2 (Parameter estimation for the NSBM by the method of moments) Given the adjacency matrix A and community labels c, for k = 1, 2, , K (obtained by, for example, right SC): 1. Set Til = P j Gl Aij nl for each i Gk and 1 l K. 2. Estimate θi for each i Gk by ˆθi = Tik 1 3. Find the set Ψk = {l : 1 l K, Til > 0 i Gk}. Set ˆBkl = 0 for each l / Ψk. 4. (a) Define Yil = log(Til + 1 nl ) for each i Gk, where the 1 nl is used to avoid overflow for the pathological case of Til = 0 for some i Gk. (b) Estimate λi for each i Gk by P l Ψk\{k}(Yik Yil) P l Ψk\{k} P j Gk(Yjk Yjl)/nk (8) (c) Estimate Bkl for each l Ψk \ {k} by ˆBkl = exp( 1 i Gk (Yik Yil)). (9) Algorithm 3 (Right SMST) Given an adjacency matrix A and the number of communities K: 1. Compute the rank-K truncated singular value decomposition A = b U b D b V T . 2. Run minimum spanning tree algorithm of Vu (2018) on b V : (a) Construct the undirected distance graph between n nodes based on the distance matrix, where the edge weight between i and j is the distance between b Vi and b Vj , the i-th and j-th rows of the matrix b V . (b) Find the minimum spanning tree of the distance graph. (c) Remove the K 1 edges with the highest weights from the minimum spanning tree . (d) Return the resulting connected components as clusters. |
| Open Source Code | No | The paper refers to the R package sna (Butts, 2020) for the Bayesian method, but does not provide open-source code or a repository link for its own proposed methodology (NSBM, Right SC, etc.). |
| Open Datasets | Yes | The data were collected by Clauset et al. (2015) via web crawling and contains data on 7856 faculty members from 112 business schools, recording the total number of Ph Ds from one institution hired by another institution. |
| Dataset Splits | No | The paper describes generating synthetic networks for simulation studies and applies the method to a real faculty hiring dataset. However, it does not provide specific details on training, validation, or test splits for any of these datasets in a way that would allow for reproduction of experimental partitioning. |
| Hardware Specification | No | The authors acknowledge Research Computing at the University of Virginia for providing computational resources and technical support that have contributed to the results reported within this publication. No specific hardware details (e.g., CPU, GPU models, or memory) are provided. |
| Software Dependencies | No | The paper mentions the 'R package sna (Butts, 2020)' for implementing a comparative Bayesian method. However, it does not specify any software dependencies with version numbers for its own proposed methods, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | For this set of experiments, networks are generated from the NSBM as follows: n = 1200 nodes are randomly assigned to K = 3 communities with equal probability. The matrix B has all diagonal elements equal to 1 and all off-diagonal elements equal to β. The parameters λi s are generated independently with log(λ) sampled uniformly from the interval ( t, t), and then rescaled to satisfy the constraint P ci=k λi = nk for each k. Each θi is independently set either to c or 0.05c, with probability 0.5 each, with the value of c chosen so that the resulting average degree of the network is 50. To determine the number of communities, we applied the edge cross-validation method with average stability selection proposed by Li et al. (2020), which selected K = 4. |