Recovering PCA and Sparse PCA via Hybrid-(l1,l2) Sparse Sampling of Data Elements
Authors: Abhisek Kundu, Petros Drineas, Malik Magdon-Ismail
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on synthetic, image, text, biological, and financial data show that not only are we able to recover PCA and sparse PCA from incomplete data, but we can speed up such computations significantly using our sparse sketch . |
| Researcher Affiliation | Collaboration | Abhisek Kundu EMAIL Intel Parallel Computing Labs Intel Tech (I) Pvt Ltd, Devarabeesanhalli, Outer Ring Road Bangalore, 560103, India Petros Drineas EMAIL Computer Science Purdue University West Lafayette, IN 47907, USA Malik Magdon-Ismail EMAIL Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA |
| Pseudocode | Yes | Algorithm 1 Element-wise Matrix Sparsification Algorithm 2 Approximation of PCA from Data Samples Algorithm 3 One-pass hybrid sampling Algorithm 4 Estimating α from Samples Appendix F. SELECT-s Algorithm |
| Open Source Code | No | The paper does not explicitly provide a link to open-source code or state that code is made available. The license is for the paper itself, not necessarily the implementation. |
| Open Datasets | Yes | Tech TC Datasets: (Gabrilovich and Markovitch 2004) ... Digit Data: (Hull 1994) ... Gene Expression Data: We use GSE10072 gene expression data for lung cancer from NCBI Gene Expression Omnibus database. |
| Dataset Splits | No | The paper mentions using various datasets but does not provide specific training/test/validation splits, percentages, or sample counts for reproduction. |
| Hardware Specification | No | The paper mentions computational time and performance comparisons using MATLAB functions but does not specify the hardware (CPU, GPU, memory, etc.) on which the experiments were run. |
| Software Dependencies | No | The paper mentions using "MATLAB function svds(A,k)" and "Spasm toolbox of Sjstrand et al. (2012)" but does not provide specific version numbers for MATLAB or the Spasm toolbox, which would be necessary for reproducibility. |
| Experiment Setup | Yes | Table 1 summarizes α for various data sets. Achlioptas et al. (2013a) argued that, for rs0 > rs1, ℓ1 sampling is better than ℓ2 (even with truncation). Our results on α in Table 1 reproduce this condition (α = 1 implies ℓ1). Moreover, our method can derive the right blend of ℓ1 and ℓ2 sampling even when the above condition fails. In this sense, we generalize the results of Achlioptas et al. (2013a). |