Distributed Matrix Completion and Robust Factorization
Authors: Lester Mackey, Ameet Talwalkar, Michael I. Jordan
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present experiments in collaborative filtering and video background modeling that demonstrate the near-linear to superlinear speed-ups attainable with this approach. To illustrate the practical speed-up and robustness of DFC, we present experimental results on collaborative filtering, video background modeling, and simulated data in Section 6. |
| Researcher Affiliation | Academia | Lester Mackey EMAIL Stanford University Department of Statistics 390 Serra Mall Stanford, CA 94305 Ameet Talwalkar EMAIL University of California, Los Angeles Computer Science Department 4732 Boelter Hall Los Angeles, CA 90095 Michael I. Jordan EMAIL University of California, Berkeley Department of Electrical Engineering and Computer Science and Department of Statistics 465 Soda Hall Berkeley, CA 94720 |
| Pseudocode | Yes | Algorithm 1 DFC-Proj Input: PΩ(M), t {PΩ(Ci)}1 i t = Samp Col(PΩ(M), t) do in parallel ˆC1 = Base-MF-Alg(PΩ(C1)) ... ˆCt = Base-MF-Alg(PΩ(Ct)) end do ˆLproj = Col Projection( ˆC1, . . . , ˆCt) Algorithm 2 DFC-RP Input: PΩ(M), t {PΩ(Ci)}1 i t = Samp Col(PΩ(M), t) do in parallel ˆC1 = Base-MF-Alg(PΩ(C1)) ... ˆCt = Base-MF-Alg(PΩ(Ct)) end do k = mediani {1...t} rank( ˆCi) ˆLproj = Rand Projection( ˆC1, . . . , ˆCt, k) Algorithm 3 DFC-Nys Input: PΩ(M), l, d PΩ(C) , PΩ(R) = Samp Col Row(PΩ(M), l, d) do in parallel ˆC = Base-MF-Alg(PΩ(C)) ˆR = Base-MF-Alg(PΩ(R)) end do ˆLnys = Gen Nystr om( ˆC, ˆR) |
| Open Source Code | Yes | In practice, one will typically run DFC jobs in a distributed fashion across a cluster; our released code supports this standard use case. |
| Open Datasets | Yes | We evaluate DFC on two of the largest publicly available collaborative filtering data sets: Movie Lens 10M (http://www.grouplens.org/) with m = 10K, n = 72K, s > 10M, and the Netflix Prize data set (http://www.netflixprize.com/) with m = 18K, n = 480K, s > 100M. |
| Dataset Splits | Yes | To generate test sets drawn from the training distribution, for each data set, we aggregated all available rating data into a single training set and withheld test entries uniformly at random, while ensuring that at least one training observation remained in each row and column. The algorithms were then run on the remaining training portions and evaluated on the test portions of each split. The results, averaged over three train-test splits, are summarized in Table 2. |
| Hardware Specification | Yes | In order to provide a fair comparison with baseline algorithms, we perform all experiments on an x86-64 architecture using a single 2.60 Ghz core and 30GB of main memory. |
| Software Dependencies | No | The paper refers to using the Accelerated Proximal Gradient (APG) algorithm of Toh and Yun (2010) and Lin et al. (2009b) as base algorithms but does not specify software implementation versions (e.g., Python, PyTorch, etc.) or specific versions of libraries/solvers. |
| Experiment Setup | Yes | We use the default parameter settings suggested by Toh and Yun (2010) and Lin et al. (2009b), and measure estimation error via root mean square error (RMSE). |