Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach
Authors: Dimitris Bertsimas, Ryan Cory-Wright, Nicholas A. G. Johnson
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical results demonstrate that our approach outperforms existing state-of-the-art approaches in terms of rank, sparsity, and mean-square error while maintaining a comparable runtime. ... In this section, we evaluate the performance of our alternating minimization heuristic (Algorithm 1) and our branch-and-bound method (Algorithm 2) implemented in Julia 1.5.2 using the Ju MP.jl package version 0.21.7 and solved using Mosek version 9.2 for the semidefinite subproblems (20). |
| Researcher Affiliation | Academia | Dimitris Bertsimas EMAIL Massachusetts Institute of Technology Cambridge, MA 02139, USA; Ryan Cory-Wright EMAIL Imperial College Business School London, SW7 2AZ, UK; Nicholas A. G. Johnson EMAIL Massachusetts Institute of Technology Cambridge, MA 02139, USA |
| Pseudocode | Yes | Algorithm 1: Alternating Minimization Heuristic; Algorithm 2: Near-Optimal SLR Decomposition |
| Open Source Code | Yes | To bridge the gap between theory and practice, we have made our code freely available on Git Hub at github.com/NicholasJohnson2020/SparseLowRankSoftware. |
| Open Datasets | No | All experiments were performed using synthetic data. To generate a synthetic data matrix D, we first fix a problem dimension n, a desired rank for the low-rank matrix k0, a desired sparsity for the sparse matrix k1 and a value σ > 0 that controls the signal to noise ratio. Next, we generate a random rank k0 matrix and k1 sparse matrix. This describes the process of generating synthetic data rather than providing access to a pre-existing open dataset. |
| Dataset Splits | Yes | We tune the hyperparameters of Algorithm 1, f RPCA, and Scaled GD using 30-fold crossvalidation, as proposed by Owen and Perry (2009). For each fold, we randomly sample l columns and rows from the input data matrix D... We set l = n (1 0.7) so that the training set Dtrain contains at least 70% of the input data. |
| Hardware Specification | Yes | All experiments were performed using synthetic data, and run on MIT s Supercloud Cluster (Reuther et al., 2018), which hosts Intel Xeon Platinum 8260 processors. The maximum RAM used across all trials was 192GB. |
| Software Dependencies | Yes | implemented in Julia 1.5.2 using the Ju MP.jl package version 0.21.7 and solved using Mosek version 9.2 for the semidefinite subproblems (20). |
| Experiment Setup | Yes | For experiments reported in Section 6.3 and Section 6.4, we tune the hyperparameters (λ, µ) for Algorithm 1 from the collection {10^-2/n, 10^-1/n, 10^0/n, 10^1/n} x {10^-2/n, 10^-1/n, 10^0/n, 10^1/n} and we set the hyperparameter γ = α * k1/n^2 for f RPCA and Scaled GD where α is tuned (independently for each method) from the collection (0.01, 0.05, 0.1, 0.5, 1, 2, 4, 6, 8, 10). For subsequent experiments in Section 6.1 and beyond, the hyperparameters of Algorithm 1, f RPCA, and Scaled GD are fixed respectively to the best-performing hyperparameters selected via cross-validation in Section 6.3 and Section 6.4. For experiments employing Algorithm 2, we set λ = µ = 1 n. We terminate Algorithm 1, Go Dec, f RPCA, and Scaled GD when ft > 0 and ft 1 ft / ft < 0.001 where ft denotes the objective value achieved by the estimate of the low-rank matrix X and the sparse matrix Y at iteration t. |