Streaming Principal Component Analysis From Incomplete Data
Authors: Armin Eftekhari, Gregory Ongie, Laura Balzano, Michael B. Wakin
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section consists of two parts: first, we empirically study the dependence of SNIPE on various parameters, and second we compare SNIPE with existing algorithms for streaming subspace estimation with missing data. In all simulations, we consider an r-dimensional subspace S Rn and a sequence of generic vectors {st}T t=1 S. Each entry of these vectors is observed with probability p (0, 1] and collected in vectors {yt}T t=1 Rn. Our objective is to estimate S from {yt}, as described in Section 1. |
| Researcher Affiliation | Academia | Armin Eftekhari EMAIL Institute of Electrical Engineering École Polytechnique Fédérale de Lausanne Lausanne, VD 1015, Switzerland Gregory Ongie EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA Laura Balzano EMAIL Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109, USA Michael B. Wakin EMAIL Department of Electrical Engineering Colorado School of Mines Golden, CO 80401, USA |
| Pseudocode | Yes | Algorithm 1 SNIPE for streaming PCA from incomplete data |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. It discusses and compares with other algorithms, but does not provide its own implementation. |
| Open Datasets | No | In all simulations, we consider an r-dimensional subspace S Rn and a sequence of generic vectors {st}T t=1 S. Each entry of these vectors is observed with probability p (0, 1] and collected in vectors {yt}T t=1 Rn. Our objective is to estimate S from {yt}, as described in Section 1. This indicates the use of synthetic data generation rather than a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes generating synthetic data for its simulations with parameters like n, r, b, and T, but does not refer to external datasets with specified training, validation, or test splits. For example: "For various values of probability p, we run SNIPE with block size b = 2r = 10 and scope of T = 500r = 2500, recording the average estimation error d G(S, b SK) over 50 trials, see (10)." |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We first set n = 100, r = 5, and let S be a generic r-dimensional subspace... For various values of probability p, we run SNIPE with block size b = 2r = 10 and scope of T = 500r = 2500, recording the average estimation error d G(S, b SK) over 50 trials... This time, we set r = 5, p = 3r/n, b = 2r, T = 500r, and vary the ambient dimension n. Next we set n = 100, r = 5, p = 3r/n, T = 500r, and vary the block size b. |