Optimal Estimation and Completion of Matrices with Biclustering Structures

Authors: Chao Gao, Yu Lu, Zongming Ma, Harrison H. Zhou

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Implementation and simulation results are given in Section 5. Now we present some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data.
Researcher Affiliation Academia Chao Gao EMAIL Yu Lu EMAIL Yale University Zongming Ma EMAIL University of Pennsylvania Harrison H. Zhou EMAIL Yale University
Pseudocode Yes Algorithm 1: A Biclustering Algorithm
Open Source Code No The paper does not contain any explicit statements about providing open-source code or links to a code repository for the described methodology.
Open Datasets No Now we present some numerical results to demonstrate the accuracy of the error rate behavior suggested by Theorem 1 on simulated data. We first generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate p = 0.5.
Dataset Splits No The paper describes generating simulated data for numerical studies but does not specify any training/test/validation dataset splits. The evaluation involves comparing the estimator's error rate behavior on these generated datasets.
Hardware Specification No The paper provides numerical results on simulated data but does not mention any specific hardware used for running these simulations or experiments.
Software Dependencies No The paper mentions algorithms like 'k-means algorithm' and 'singular value decomposition' but does not specify any software names with version numbers used for implementation or analysis.
Experiment Setup Yes Our theoretical result indicates the rate of recovery is rρpk2n2 + log k for the root mean squared error (RMSE) 1n ˆθ θ. When k is not too large, the dominating pn . We are going to confirm this rate by simulation. We first generate our data from SBM with the number of blocks k {2, 4, 8, 16}. The observation rate p = 0.5. For every fixed k, we use four different Q = 0.51k1T k +0.1t Ik with t = 1, 2, 3, 4 and generate the community labels z uniformly on [k]. Then we calculate the error 1n ˆθ θ. Panel (a) of Figure 1 shows the error versus the sample size n. ... We simulate data with Gaussian noise under four different settings of k1 and k2. For each (k1, k2) {(4, 4), (4, 8), (8, 8), (8, 12)}, the entries of matrix Q are independently and uniformly generated from {1, 2, 3, 4, 5}. The cluster labels z1 and z2 are uniform on [k1] and [k2] respectively. After generating Q, z1 and z2, we add an N(0, 1) noise to the data and observe Xij with probability p = 0.1.