Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis
Authors: Haroon Raja, Waheed Bajwa
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings. |
| Researcher Affiliation | Academia | Haroon Raja EMAIL Department of Electrical and Computer Engineering Rutgers University New Brunswick, Piscataway, NJ 08854 USA Waheed U. Bajwa EMAIL Department of Electrical and Computer Engineering Department of Statistics Rutgers University New Brunswick, Piscataway, NJ 08854 USA |
| Pseudocode | Yes | Algorithm 1 Distributed Krasulina s Method (D-Krasulina) Algorithm 2 Distributed Mini-batch Krasulina s Method (DM-Krasulina) |
| Open Source Code | No | The paper discusses the methodology and presents numerical results from experiments but does not provide any explicit statements about the release of source code, nor does it include links to a code repository. |
| Open Datasets | Yes | Our first set of experiments is for the MNIST dataset (Le Cun, 1998) ... and the Higgs dataset (Baldi et al., 2014). |
| Dataset Splits | No | The paper focuses on streaming PCA, where data arrives continuously, and the algorithms process samples sequentially rather than using traditional train/test/validation splits. While total sample counts are provided (e.g., T = 10^6 samples for synthetic, T = 6 * 10^4 for MNIST, 1.1 * 10^7 for Higgs), there is no mention of explicit dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper mentions running experiments on "low-cost compute machines" but does not provide specific details such as GPU models, CPU types, or memory configurations used for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | In all the experiments in the following we use step size of the form γt = c/t. We performed experiments with multiple values of c and here we are reporting the results for the value of c which achieves the best convergence rate. Further details about each experiment are provided in the following sections. As predicted by Corollary 1, we can see that after T/B iterations of DM-Krasulina, the error ΨT/B is on the order of O(1/T) for B {1, 10, 100, 500, 1000}, while for B = 2000, the error ΨT/B is not optimal anymore. |