reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Streaming Principal Component Analysis From Incomplete Data

Authors: Armin Eftekhari, Gregory Ongie, Laura Balzano, Michael B. Wakin

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section consists of two parts: ﬁrst, we empirically study the dependence of SNIPE on various parameters, and second we compare SNIPE with existing algorithms for streaming subspace estimation with missing data. In all simulations, we consider an r-dimensional subspace S Rn and a sequence of generic vectors {st}T t=1 S. Each entry of these vectors is observed with probability p (0, 1] and collected in vectors {yt}T t=1 Rn. Our objective is to estimate S from {yt}, as described in Section 1.
Researcher Affiliation	Academia	Armin Eftekhari EMAIL Institute of Electrical Engineering École Polytechnique Fédérale de Lausanne Lausanne, VD 1015, Switzerland Gregory Ongie EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA Laura Balzano EMAIL Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109, USA Michael B. Wakin EMAIL Department of Electrical Engineering Colorado School of Mines Golden, CO 80401, USA
Pseudocode	Yes	Algorithm 1 SNIPE for streaming PCA from incomplete data
Open Source Code	No	The paper does not include an explicit statement about releasing source code for the methodology described in this paper, nor does it provide a link to a code repository. It discusses and compares with other algorithms, but does not provide its own implementation.
Open Datasets	No	In all simulations, we consider an r-dimensional subspace S Rn and a sequence of generic vectors {st}T t=1 S. Each entry of these vectors is observed with probability p (0, 1] and collected in vectors {yt}T t=1 Rn. Our objective is to estimate S from {yt}, as described in Section 1. This indicates the use of synthetic data generation rather than a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes generating synthetic data for its simulations with parameters like n, r, b, and T, but does not refer to external datasets with specified training, validation, or test splits. For example: "For various values of probability p, we run SNIPE with block size b = 2r = 10 and scope of T = 500r = 2500, recording the average estimation error d G(S, b SK) over 50 trials, see (10)."
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	We ﬁrst set n = 100, r = 5, and let S be a generic r-dimensional subspace... For various values of probability p, we run SNIPE with block size b = 2r = 10 and scope of T = 500r = 2500, recording the average estimation error d G(S, b SK) over 50 trials... This time, we set r = 5, p = 3r/n, b = 2r, T = 500r, and vary the ambient dimension n. Next we set n = 100, r = 5, p = 3r/n, T = 500r, and vary the block size b.