Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization
Authors: Nicolas Gillis, Robert Luce
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we mainly focus on the robustness performance of the different algorithms and first compare them on synthetic data sets. We also compare them on the popular swimmer data set. Comparison on large-scale real-world data sets would require dedicated implementations, such as the parallel first-order method proposed by Bittorf et al. (2012) for the LP (2), and is a topic for further research. |
| Researcher Affiliation | Academia | Nicolas Gillis EMAIL Department of Mathematics and Operational Research Faculté Polytechnique, Université de Mons Rue de Houdain 9, 7000 Mons, Belgium Robert Luce EMAIL Institut f ur Mathematik, MA 3-3 Technische Universit at Berlin Straße des 17. Juni 136 10623 Berlin, Germany |
| Pseudocode | Yes | Algorithm 1 Hottopixx Extracting Columns of a Noisy Separable Matrix using Linear Optimization (Bittorf et al., 2012) Algorithm 2 Extracting Columns of a Noisy Separable Matrix using Linear Optimization Algorithm 3 Extracting Columns of a Noisy Separable Matrix using Linear Optimization Algorithm 4 Post-Processing Clustering Diagonal Entries of X (Gillis, 2013) Algorithm 5 Extracting Columns of a Noisy Separable Matrix with Outliers using Linear Optimization Algorithm 6 Hybrid Post-Processing for LP-based Near-Separable NMF Algorithms |
| Open Source Code | Yes | The code for all algorithms is available at https://sites.google.com/site/nicolasgillis/code. |
| Open Datasets | Yes | Finally we scale the resulting matrix N by a scalar such that N 1 = ϵ. In order to avoid a bias towards the natural ordering, the columns of M are permuted at random in a last step. ...The swimmer data set is a widely used data set for benchmarking NMF algorithms (Donoho and Stodden, 2003). |
| Dataset Splits | No | No explicit train/test/validation splits are mentioned. The paper describes generating '25 data sets' for each noise level to average results, but this is for evaluation iteration, not data partitioning for model training. |
| Hardware Specification | Yes | The LP have been solved using the IBM ILOG CPLEX Optimizer2 on a standard Linux box. Because of the greater complexity of the LP-based approaches (formulating (2) and (8) as LP s requires n2 + mn variables), the size of the input data matrices allowed on a standard machine is limited, roughly mn2 106 (for example, on a two-core machine with 2.99GHz and 2GB of RAM, it already takes about one minute to process a 100-by-100 matrix using CPLEX). |
| Software Dependencies | No | The paper mentions "IBM ILOG CPLEX Optimizer" and "Matlab's rand function" without specifying version numbers for either. A link to CPLEX is provided, but no explicit version is stated in the text. |
| Experiment Setup | Yes | In all experiments the problem dimensions are fixed to m = 50, n = 100 and r = 10. We conducted our experiments with six different data models. As we will describe next, the models differ in the way the factor H is constructed and the sparsity of the noise matrix N. Given a desired noise level ϵ, the noisy r-separable matrix M = M + N = WH + N is generated as follows: The entries of W are drawn uniformly at random from the interval [0, 1] (using Matlab s rand function). Then each column of W is normalized so that its entries sum to one. ...The vector p in the objective function was randomly generated using the randn function of Matlab. |