Latent Space Inference of Internet-Scale Networks

Authors: Qirong Ho, Junming Yin, Eric P. Xing

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our method can infer 1000 communities from a 101-million-node web graph in less than 40 hours using a small cluster of 5 machines, and that, on real-world networks with ground truth, our community recovery accuracy is competitive with or outperforms other scalable probabilistic models.
Researcher Affiliation Academia Qirong Ho EMAIL Institute for Infocomm Research A*STAR Singapore 138632. Junming Yin EMAIL Department of Management Information Systems Eller College of Management University of Arizona Tucson, AZ 85721. Eric P. Xing EMAIL School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213.
Pseudocode Yes Algorithm 1 Distributed-parallel Stochastic Variational Inference for STM
Open Source Code No The paper mentions code for baseline methods but does not provide an explicit link or statement for the open-source release of the code for their proposed STM SVI algorithm. It only states, "we develop our C++ implementation of the SVI algorithm for STM (Algorithm 1) on top of the Petuum parameter server..." without providing access to their specific implementation.
Open Datasets Yes We performed all our experiments on real-world networks with ground-truth communities provided by Yang and Leskovec (2012b)... These networks are available at http://snap.stanford.edu/data/.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It describes using ground-truth communities for evaluation and mentions processing steps like removing nodes without ground-truth assignments for NMI computation, but no standard data splits are specified.
Hardware Specification Yes We used server machines equipped with 128GB RAM and 2 Intel Xeon E5-2450 8-core processors, for a total of 16 CPU cores per machine running at 2.10GHz. We ran the distributed-parallel SVI algorithm using 4 such machines, for a total of 64 cores/worker threads and 512GB distributed RAM.
Software Dependencies No The paper mentions using C, Java, MATLAB, and the Petuum system for implementation, but does not specify any version numbers for these software components or libraries.
Experiment Setup Yes Termination Criterion. We monitored the convergence of the SVI algorithm by computing the variational mini-batch lower bound LS(η, γ) (Equation 6) at each iteration... we terminated the algorithm when LS(η, γ) decreases for the first time... All our trials on ground-truth networks terminated with high-quality results within 200 iterations under this criterion (using C = 1 triangle subsampled for each node, per iteration)... Initialization... to seed node i to community k, we initialize γi,k = 10K and the remaining non-seeded elements of γi to be small, randomly-generated numbers close to 1. For the variational parameters η corresponding to the triangle-generating probabilities B, we initialize them as follows: [ηxxx,1, ηxxx,2] = [1, 3], and [ηxx,1, ηxx,2, ηxx,3] = [2, 1, 1] and [η0,1, η0,2] = [3, 1]. ... Finally, we fix the hyperparameters of θ, B to α = λ = 0.1.