reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Analysis of Subset-Selection Based L_p Low-Rank Approximation

Authors: Chen Dan, Hong Wang, Hongyang Zhang, Yuchen Zhou, Pradeep K. Ravikumar

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the low rank approximation problem of any given matrix A over Rn m and Cn m in entry-wise ℓp loss, that is, ﬁnding a rank-k matrix X such that A X p is minimized. Unlike the traditional ℓ2 setting, this particular variant is NP-Hard. We show that the algorithm of column subset selection, which was an algorithmic foundation of many existing algorithms, enjoys approximation ratio (k + 1)1/p for 1 p 2 and (k + 1)1 1/p for p 2. This improves upon the previous O(k + 1) bound for p 1 [1]. We complement our analysis with lower bounds; these bounds match our upper bounds up to constant 1 when p 2. At the core of our techniques is an application of Riesz-Thorin interpolation theorem from harmonic analysis, which might be of independent interest to other algorithmic designs and analysis more broadly.
Researcher Affiliation	Academia	Chen Dan Carnegie Mellon University EMAIL Hong Wang Princeton University EMAIL Hongyang Zhang Toyota Technological Institute at Chicago EMAIL Yuchen Zhou University of Wisconsin, Madison EMAIL Pradeep Ravikumar Carnegie Mellon University EMAIL
Pseudocode	Yes	Algorithm 1 A cp,k approximation to problem (1) by column subset selection. Algorithm 2 [1] SELECTCOLUMNS (k, A): Selecting O(k log m) columns of A. Algorithm 3 [1]An algorithm that transforms an O(k log m)-rank matrix factorization into a k-rank matrix factorization without inﬂating the error too much.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not conduct experiments with specific datasets. Therefore, no information about publicly available training datasets or access is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with dataset splits. Therefore, no information about validation splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experiments. Therefore, no hardware specifications for running experiments are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any experiments requiring specific software dependencies with version numbers for reproducibility.
Experiment Setup	No	The paper is theoretical and does not describe any experiments or their setup. Therefore, no specific experimental setup details like hyperparameters or training configurations are provided.