reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization

Authors: Nicolas Gillis, Robert Luce

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we mainly focus on the robustness performance of the diﬀerent algorithms and ﬁrst compare them on synthetic data sets. We also compare them on the popular swimmer data set. Comparison on large-scale real-world data sets would require dedicated implementations, such as the parallel ﬁrst-order method proposed by Bittorf et al. (2012) for the LP (2), and is a topic for further research.
Researcher Affiliation	Academia	Nicolas Gillis EMAIL Department of Mathematics and Operational Research Faculté Polytechnique, Université de Mons Rue de Houdain 9, 7000 Mons, Belgium Robert Luce EMAIL Institut f ur Mathematik, MA 3-3 Technische Universit at Berlin Straße des 17. Juni 136 10623 Berlin, Germany
Pseudocode	Yes	Algorithm 1 Hottopixx Extracting Columns of a Noisy Separable Matrix using Linear Optimization (Bittorf et al., 2012) Algorithm 2 Extracting Columns of a Noisy Separable Matrix using Linear Optimization Algorithm 3 Extracting Columns of a Noisy Separable Matrix using Linear Optimization Algorithm 4 Post-Processing Clustering Diagonal Entries of X (Gillis, 2013) Algorithm 5 Extracting Columns of a Noisy Separable Matrix with Outliers using Linear Optimization Algorithm 6 Hybrid Post-Processing for LP-based Near-Separable NMF Algorithms
Open Source Code	Yes	The code for all algorithms is available at https://sites.google.com/site/nicolasgillis/code.
Open Datasets	Yes	Finally we scale the resulting matrix N by a scalar such that N 1 = ϵ. In order to avoid a bias towards the natural ordering, the columns of M are permuted at random in a last step. ...The swimmer data set is a widely used data set for benchmarking NMF algorithms (Donoho and Stodden, 2003).
Dataset Splits	No	No explicit train/test/validation splits are mentioned. The paper describes generating '25 data sets' for each noise level to average results, but this is for evaluation iteration, not data partitioning for model training.
Hardware Specification	Yes	The LP have been solved using the IBM ILOG CPLEX Optimizer2 on a standard Linux box. Because of the greater complexity of the LP-based approaches (formulating (2) and (8) as LP s requires n2 + mn variables), the size of the input data matrices allowed on a standard machine is limited, roughly mn2 106 (for example, on a two-core machine with 2.99GHz and 2GB of RAM, it already takes about one minute to process a 100-by-100 matrix using CPLEX).
Software Dependencies	No	The paper mentions "IBM ILOG CPLEX Optimizer" and "Matlab's rand function" without specifying version numbers for either. A link to CPLEX is provided, but no explicit version is stated in the text.
Experiment Setup	Yes	In all experiments the problem dimensions are ﬁxed to m = 50, n = 100 and r = 10. We conducted our experiments with six diﬀerent data models. As we will describe next, the models diﬀer in the way the factor H is constructed and the sparsity of the noise matrix N. Given a desired noise level ϵ, the noisy r-separable matrix M = M + N = WH + N is generated as follows: The entries of W are drawn uniformly at random from the interval [0, 1] (using Matlab s rand function). Then each column of W is normalized so that its entries sum to one. ...The vector p in the objective function was randomly generated using the randn function of Matlab.