reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

Authors: Haroon Raja, Waheed Bajwa

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings.
Researcher Affiliation	Academia	Haroon Raja EMAIL Department of Electrical and Computer Engineering Rutgers University New Brunswick, Piscataway, NJ 08854 USA Waheed U. Bajwa EMAIL Department of Electrical and Computer Engineering Department of Statistics Rutgers University New Brunswick, Piscataway, NJ 08854 USA
Pseudocode	Yes	Algorithm 1 Distributed Krasulina s Method (D-Krasulina) Algorithm 2 Distributed Mini-batch Krasulina s Method (DM-Krasulina)
Open Source Code	No	The paper discusses the methodology and presents numerical results from experiments but does not provide any explicit statements about the release of source code, nor does it include links to a code repository.
Open Datasets	Yes	Our first set of experiments is for the MNIST dataset (Le Cun, 1998) ... and the Higgs dataset (Baldi et al., 2014).
Dataset Splits	No	The paper focuses on streaming PCA, where data arrives continuously, and the algorithms process samples sequentially rather than using traditional train/test/validation splits. While total sample counts are provided (e.g., T = 10^6 samples for synthetic, T = 6 * 10^4 for MNIST, 1.1 * 10^7 for Higgs), there is no mention of explicit dataset splits for training, validation, or testing.
Hardware Specification	No	The paper mentions running experiments on "low-cost compute machines" but does not provide specific details such as GPU models, CPU types, or memory configurations used for the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	In all the experiments in the following we use step size of the form γt = c/t. We performed experiments with multiple values of c and here we are reporting the results for the value of c which achieves the best convergence rate. Further details about each experiment are provided in the following sections. As predicted by Corollary 1, we can see that after T/B iterations of DM-Krasulina, the error ΨT/B is on the order of O(1/T) for B {1, 10, 100, 500, 1000}, while for B = 2000, the error ΨT/B is not optimal anymore.