Recovering PCA and Sparse PCA via Hybrid-(l1,l2) Sparse Sampling of Data Elements

Authors: Abhisek Kundu, Petros Drineas, Malik Magdon-Ismail

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on synthetic, image, text, biological, and financial data show that not only are we able to recover PCA and sparse PCA from incomplete data, but we can speed up such computations significantly using our sparse sketch .
Researcher Affiliation Collaboration Abhisek Kundu EMAIL Intel Parallel Computing Labs Intel Tech (I) Pvt Ltd, Devarabeesanhalli, Outer Ring Road Bangalore, 560103, India Petros Drineas EMAIL Computer Science Purdue University West Lafayette, IN 47907, USA Malik Magdon-Ismail EMAIL Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA
Pseudocode Yes Algorithm 1 Element-wise Matrix Sparsification Algorithm 2 Approximation of PCA from Data Samples Algorithm 3 One-pass hybrid sampling Algorithm 4 Estimating α from Samples Appendix F. SELECT-s Algorithm
Open Source Code No The paper does not explicitly provide a link to open-source code or state that code is made available. The license is for the paper itself, not necessarily the implementation.
Open Datasets Yes Tech TC Datasets: (Gabrilovich and Markovitch 2004) ... Digit Data: (Hull 1994) ... Gene Expression Data: We use GSE10072 gene expression data for lung cancer from NCBI Gene Expression Omnibus database.
Dataset Splits No The paper mentions using various datasets but does not provide specific training/test/validation splits, percentages, or sample counts for reproduction.
Hardware Specification No The paper mentions computational time and performance comparisons using MATLAB functions but does not specify the hardware (CPU, GPU, memory, etc.) on which the experiments were run.
Software Dependencies No The paper mentions using "MATLAB function svds(A,k)" and "Spasm toolbox of Sjstrand et al. (2012)" but does not provide specific version numbers for MATLAB or the Spasm toolbox, which would be necessary for reproducibility.
Experiment Setup Yes Table 1 summarizes α for various data sets. Achlioptas et al. (2013a) argued that, for rs0 > rs1, ℓ1 sampling is better than ℓ2 (even with truncation). Our results on α in Table 1 reproduce this condition (α = 1 implies ℓ1). Moreover, our method can derive the right blend of ℓ1 and ℓ2 sampling even when the above condition fails. In this sense, we generalize the results of Achlioptas et al. (2013a).