reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scale Invariant Power Iteration

Authors: Cheolmin Kim, Youngseok Kim, Diego Klabjan

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In numerical experiments, we introduce applications to independent component analysis, Gaussian mixtures, and non-negative matrix factorization with the KL-divergence. Experimental results demonstrate that SCI-PI is competitive to application speciﬁc stateof-the-art algorithms and often yield better solutions. Keywords: scale invariance, power iteration, optimization, convergence analysis, machine learning applications
Researcher Affiliation	Academia	Cheolmin Kim* EMAIL Department of Industrial Engineering and Management Sciences Northwestern University Evanston, IL, 60208, USA Youngseok Kim* EMAIL Department of Statistics University of Chicago Chicago, IL, 60637, USA Diego Klabjan EMAIL Department of Industrial Engineering and Management Sciences Northwestern University Evanston, IL, 60208, USA
Pseudocode	Yes	Algorithm 1 SCI-PI Output: initial point x0 Bd k 0 while f(xk) = 0 do xk+1 f(xk) f(xk) 2 k k + 1 end while Output: xk
Open Source Code	Yes	A description of the data sets is provided below and source codes are available at: https://github.com/youngseok-kim/SCIPI-JMLR.
Open Datasets	Yes	For KL-NMF (Section 5.2), we use four public real data sets available online and summarized in Table 1. ... These four data sets are retrieved from https://www.microsoft.com/en-us/research/project, https: //archive.ics.uci.edu/ml/datasets/bag+of+words, and https://snap.stanford.edu/data/wiki-Vote. html For GMM (Section 5.3), we use ten public real data sets, corresponding to all small and moderate data sets provided by the mlbench package in R. ... For ICA, discussed also in Section 5.3, we use nine public data sets (see Table 3) from the UCI Machine Learning repository .
Dataset Splits	No	The paper describes the datasets and their characteristics but does not explicitly provide details about training/test/validation splits with percentages, sample counts, or references to predefined splits for reproduction. For GMM, it mentions running EM and SCI-PI for the given number of classes without class labels, implying a lack of explicit train/test splits for supervised evaluation.
Hardware Specification	Yes	All experiments are implemented on a standard laptop (2.6 GHz Intel Core i7 processor and 16GM memory) using the Julia programming language.
Software Dependencies	No	The paper states using "the Julia programming language" but does not specify its version or the versions of any other software libraries or dependencies used for implementation.
Experiment Setup	Yes	We estimate r = 20 factors. ... For PGD, the learning rate is optimized by grid search. ... Our method (SCI-PI): It iterates hnew h (σ +W T z) 2 followed by rescaling, where σ is a shift parameter. We simply use σ = 1 for preconditioning. The stopping criterion is fk f 2 10 6f where fk is the objective value at iteration k and f is the solution obtained by MIXSQP after extensive computation time. ... The EM and SCI-PI updates for π can be written respectively as ... where α is a shift parameter set to 1.