A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation
Authors: Karl Rohe, Jun Tao, Xintian Han, Norbert Binkiewicz
JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A computational experiment in Section 2.4 simulates graphs up to n = 10,000,000 nodes with m = 100,000,000 edges. For example, on a graph with n = 500,000 and m = 5,000,000, fast RG runs in less than one second on a 3.5 GHz Intel i5. Keywords: Random Dot Product Graph, Edge exchangeable, Simulation |
| Researcher Affiliation | Academia | Statistics Department University of Wisconsin-Madison Madison, WI 53706, USA School of Mathematical Sciences Peking University Beijing, 100871, China Yuanpei College Peking University Beijing, 100871, China |
| Pseudocode | Yes | Algorithm 1 fast RG(X,S,Y) Require: X Rn Kx, S RKx Ky, and Y Rd Ky with all matrices containing non-negative entries. Compute diagonal matrix CX RKx Kx with CX = diag( i Xi1 , ... , i Xi Kx). Compute diagonal matrix CY RKy Ky with CY = diag( i Yi1 , ... , i Yi Ky). Define X = XC 1 X , S = CXSCY, and Y = YC 1 Y . Sample the number of edges m Poisson( u,v Suv). for ℓ= 1 : m do Sample U {1, ..., Kx},V {1, ..., Ky} with P(U = u,V = v) Suv. Sample I {1, ..., n} with P(I = i) = Xi U. Sample J {1, ..., d} with P(J = j) = Yj V. Add edge (I,J) to the graph, allowing for multiple edges (I,J). end for |
| Open Source Code | Yes | An implementation in R is available on github. Code at https://github.com/karlrohe/fast RG gives an implementation of fast RG in R. |
| Open Datasets | No | The paper describes generating graphs via simulation models (e.g., Stochastic Blockmodel, RDPG) and does not use pre-existing open datasets for its experiments. It focuses on the simulation methodology itself. |
| Dataset Splits | No | The paper describes generating synthetic graphs for experimental evaluation with varying parameters (n, E(m), K, density), not using pre-existing datasets with train/test splits. |
| Hardware Specification | Yes | For example, on a graph with n = 500,000 and m = 5,000,000, fast RG runs in less than one second on a 3.5 GHz Intel i5. In Figure 1, the vertical axes present the running time in R on a Retina 5K i Mac, 27-inch, Late 2014 with 3.5 GHz Intel i5 and 8GB of 1600 MHz DDR3 memory. The speed comparison in Figure 2 corresponds to the average running time over 10 simulations performed on a 2015 Mac Book Pro, 2.8 GHz Intel Core i7, with 16 GB 1600 MHz DDR3 running Python 3.5.2 and R 3.3.2. |
| Software Dependencies | Yes | An implementation in R is available on github. The speed comparison in Figure 2 corresponds to the average running time over 10 simulations performed on a 2015 Mac Book Pro, 2.8 GHz Intel Core i7, with 16 GB 1600 MHz DDR3 running Python 3.5.2 and R 3.3.2. |
| Experiment Setup | Yes | In all simulations X =Y and K = 5. The elements of X are independent Poisson(1) random variables and the elements of S are independent Uniform[0,1] random variables. To specify E(m), the parameter avg Deg is set to E(m)/n. The values of n range from 10,000 to 10,000,000 and the values of E(m) range from 100,000 to 100,000,000. This section investigates the graph density at which fast RG becomes slower than simulating each element Aij as a Bernoulli random variable. In Figure 3, the reported time to compute this naive (element-wise) algorithm includes both (i) the time it takes to compute the probabilities e A = X%*%B%*%t(X) and (ii) sample the edges z = rbinom(length(e A), 1, e A). The time for fast RG is for a directed graph, represented as a sparse matrix. The time to compute fast RG also includes the time it takes to construct X and B. Figure 3 compares these two approaches on a set of Stochastic Blockmodels for values K {2,5,10}, n {500,1000,5000,10000}, and graph density ρ = n 2E(m) varying between .02 and .35. In all simulations, the S matrix is proportional to IK + JK RK K, where IK is the identity matrix and JK is the matrix of ones. The scale of S is adjusted to ensure the correct density ρ. |