reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Randomized Spectral Co-Clustering for Large-Scale Directed Networks

Authors: Xiao Guo, Yixuan Qiu, Hai Zhang, Xiangyu Chang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerically, we design and conduct simulations to support our theoretical results and test the eﬃciency of the algorithms on real networks with up to millions of nodes. A publicly available R package Rand Clust is developed for better usability and reproducibility of the proposed methods.
Researcher Affiliation	Academia	Xiao Guo EMAIL Center for Modern Statistics School of Mathematics Northwest University, Xi an, China Yixuan Qiu EMAIL School of Statistics and Management Shanghai University of Finance and Economics, Shanghai, China Hai Zhang EMAIL Center for Modern Statistics School of Mathematics Northwest University, Xi an, China Xiangyu Chang EMAIL Center for Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University, Xi an, China
Pseudocode	Yes	Algorithm 1 Spectral co-clustering with k-means Input: Adjacency matrix A Rn n of a directed network, number of row clusters Ky, and number of column clusters Kz (Ky Kz). 1: Compute the partial SVD of A, with left and right singular vectors U Rn Ky and V Rn Ky. 2: Run k-means on U with Ky target clusters and on V with Kz clusters. 3: Output the co-clustering results. Algorithm 2 Spectral co-clustering with spherical k-means Input: Adjacency matrix A Rn n of a directed network, number of row clusters Ky, and number of column clusters Kz (Ky Kz). 1: Compute the partial SVD of A, with left and right singular vectors U Rn Ky and V Rn Ky. 2: Construct U and V , whose rows are normalized rows of U and V , respectively. The zero rows are remained the same. 3: Run k-means on U with Ky target clusters and on V with Kz clusters. 4: Output the co-clustering results.
Open Source Code	Yes	A publicly available R package Rand Clust is developed for better usability and reproducibility of the proposed methods. ... 1. https://github.com/Xiao Guo-stat/Rand Clust
Open Datasets	Yes	We consider two directed networks, one is the statisticians citation network (Ji and Jin, 2016) and the other is the European email network (Yin et al., 2017). ... Table 3: A summary of the ﬁve real large-scale networks. Data No. of nodes No. of edges Target rank Epinions social network (Richardson et al., 2003) 75,877 508,836 3 Slashdot social network (Leskovec et al., 2009) 77,360 905,468 5 Berkeley-Stanford web network (Leskovec et al., 2009) 654,782 7,499,425 4 Wikipedia top categories network(Yin et al., 2017) 1,791,489 28,511,807 5 Wikipedia talk network (Leskovec et al., 2010) 2,388,953 5,018,445 3
Dataset Splits	No	The paper describes simulation model setups and evaluates algorithms on existing real-world networks without mentioning specific training, testing, or validation splits. It focuses on misclustering rates and approximation errors on the full networks or simulated data.
Hardware Specification	Yes	A machine with Intel Core i9-9900K CPU 3.60GHz, 32GB memory, and 64-bit WS operating-system, and R version 4.2.2 is used for all computations.
Software Dependencies	Yes	A machine with Intel Core i9-9900K CPU 3.60GHz, 32GB memory, and 64-bit WS operating-system, and R version 4.2.2 is used for all computations. ... We use the R package irlba to compute the singular vector iteratively after the sampling step. ... svds in R package RSpectra (Qiu and Mei, 2019) and the augmented implicitly restarted Lanczos bidiagonalization algorithm (Baglama and Reichel, 2005) (irlba in R package irlba (Baglama et al., 2019)).
Experiment Setup	Yes	In the random projection scheme, the oversampling parameter is 10, the power parameter is 2, and the test matrices are generated with i.i.d. standard Gaussian entries. In the random sampling scheme, the sampling rate is 0.7. ... For RP-SCC (RP-Ss CC), the oversampling parameter is 10, the power parameter is 2... For RS-SCC (RS-Ss CC), the sampling rate is 0.8... For svds and irlba, the tolerance parameter is set to be 10 5. For RB-Krylov, the power parameter is also 2... For RP, the power parameter is 1 and the oversampling parameter is 5... For RS, the sampling parameter is 0.7... For RB-Krylov, the power parameter is 1. For rsvd, the power and oversampling parameters are both the same with those in RP.