Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

Authors: Haroon Raja, Waheed Bajwa

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings.
Researcher Affiliation Academia Haroon Raja EMAIL Department of Electrical and Computer Engineering Rutgers University New Brunswick, Piscataway, NJ 08854 USA Waheed U. Bajwa EMAIL Department of Electrical and Computer Engineering Department of Statistics Rutgers University New Brunswick, Piscataway, NJ 08854 USA
Pseudocode Yes Algorithm 1 Distributed Krasulina s Method (D-Krasulina) Algorithm 2 Distributed Mini-batch Krasulina s Method (DM-Krasulina)
Open Source Code No The paper discusses the methodology and presents numerical results from experiments but does not provide any explicit statements about the release of source code, nor does it include links to a code repository.
Open Datasets Yes Our first set of experiments is for the MNIST dataset (Le Cun, 1998) ... and the Higgs dataset (Baldi et al., 2014).
Dataset Splits No The paper focuses on streaming PCA, where data arrives continuously, and the algorithms process samples sequentially rather than using traditional train/test/validation splits. While total sample counts are provided (e.g., T = 10^6 samples for synthetic, T = 6 * 10^4 for MNIST, 1.1 * 10^7 for Higgs), there is no mention of explicit dataset splits for training, validation, or testing.
Hardware Specification No The paper mentions running experiments on "low-cost compute machines" but does not provide specific details such as GPU models, CPU types, or memory configurations used for the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes In all the experiments in the following we use step size of the form γt = c/t. We performed experiments with multiple values of c and here we are reporting the results for the value of c which achieves the best convergence rate. Further details about each experiment are provided in the following sections. As predicted by Corollary 1, we can see that after T/B iterations of DM-Krasulina, the error ΨT/B is on the order of O(1/T) for B {1, 10, 100, 500, 1000}, while for B = 2000, the error ΨT/B is not optimal anymore.