Streaming Principal Component Analysis From Incomplete Data

Authors: Armin Eftekhari, Gregory Ongie, Laura Balzano, Michael B. Wakin

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section consists of two parts: first, we empirically study the dependence of SNIPE on various parameters, and second we compare SNIPE with existing algorithms for streaming subspace estimation with missing data. In all simulations, we consider an r-dimensional subspace S Rn and a sequence of generic vectors {st}T t=1 S. Each entry of these vectors is observed with probability p (0, 1] and collected in vectors {yt}T t=1 Rn. Our objective is to estimate S from {yt}, as described in Section 1.
Researcher Affiliation Academia Armin Eftekhari EMAIL Institute of Electrical Engineering École Polytechnique Fédérale de Lausanne Lausanne, VD 1015, Switzerland Gregory Ongie EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA Laura Balzano EMAIL Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109, USA Michael B. Wakin EMAIL Department of Electrical Engineering Colorado School of Mines Golden, CO 80401, USA
Pseudocode Yes Algorithm 1 SNIPE for streaming PCA from incomplete data
Open Source Code No The paper does not include an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. It discusses and compares with other algorithms, but does not provide its own implementation.
Open Datasets No In all simulations, we consider an r-dimensional subspace S Rn and a sequence of generic vectors {st}T t=1 S. Each entry of these vectors is observed with probability p (0, 1] and collected in vectors {yt}T t=1 Rn. Our objective is to estimate S from {yt}, as described in Section 1. This indicates the use of synthetic data generation rather than a publicly available dataset with concrete access information.
Dataset Splits No The paper describes generating synthetic data for its simulations with parameters like n, r, b, and T, but does not refer to external datasets with specified training, validation, or test splits. For example: "For various values of probability p, we run SNIPE with block size b = 2r = 10 and scope of T = 500r = 2500, recording the average estimation error d G(S, b SK) over 50 trials, see (10)."
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes We first set n = 100, r = 5, and let S be a generic r-dimensional subspace... For various values of probability p, we run SNIPE with block size b = 2r = 10 and scope of T = 500r = 2500, recording the average estimation error d G(S, b SK) over 50 trials... This time, we set r = 5, p = 3r/n, b = 2r, T = 500r, and vary the ambient dimension n. Next we set n = 100, r = 5, p = 3r/n, T = 500r, and vary the block size b.