reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Matrix Completion and Robust Factorization

Authors: Lester Mackey, Ameet Talwalkar, Michael I. Jordan

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also present experiments in collaborative ﬁltering and video background modeling that demonstrate the near-linear to superlinear speed-ups attainable with this approach. To illustrate the practical speed-up and robustness of DFC, we present experimental results on collaborative ﬁltering, video background modeling, and simulated data in Section 6.
Researcher Affiliation	Academia	Lester Mackey EMAIL Stanford University Department of Statistics 390 Serra Mall Stanford, CA 94305 Ameet Talwalkar EMAIL University of California, Los Angeles Computer Science Department 4732 Boelter Hall Los Angeles, CA 90095 Michael I. Jordan EMAIL University of California, Berkeley Department of Electrical Engineering and Computer Science and Department of Statistics 465 Soda Hall Berkeley, CA 94720
Pseudocode	Yes	Algorithm 1 DFC-Proj Input: PΩ(M), t {PΩ(Ci)}1 i t = Samp Col(PΩ(M), t) do in parallel ˆC1 = Base-MF-Alg(PΩ(C1)) ... ˆCt = Base-MF-Alg(PΩ(Ct)) end do ˆLproj = Col Projection( ˆC1, . . . , ˆCt) Algorithm 2 DFC-RP Input: PΩ(M), t {PΩ(Ci)}1 i t = Samp Col(PΩ(M), t) do in parallel ˆC1 = Base-MF-Alg(PΩ(C1)) ... ˆCt = Base-MF-Alg(PΩ(Ct)) end do k = mediani {1...t} rank( ˆCi) ˆLproj = Rand Projection( ˆC1, . . . , ˆCt, k) Algorithm 3 DFC-Nys Input: PΩ(M), l, d PΩ(C) , PΩ(R) = Samp Col Row(PΩ(M), l, d) do in parallel ˆC = Base-MF-Alg(PΩ(C)) ˆR = Base-MF-Alg(PΩ(R)) end do ˆLnys = Gen Nystr om( ˆC, ˆR)
Open Source Code	Yes	In practice, one will typically run DFC jobs in a distributed fashion across a cluster; our released code supports this standard use case.
Open Datasets	Yes	We evaluate DFC on two of the largest publicly available collaborative ﬁltering data sets: Movie Lens 10M (http://www.grouplens.org/) with m = 10K, n = 72K, s > 10M, and the Netﬂix Prize data set (http://www.netflixprize.com/) with m = 18K, n = 480K, s > 100M.
Dataset Splits	Yes	To generate test sets drawn from the training distribution, for each data set, we aggregated all available rating data into a single training set and withheld test entries uniformly at random, while ensuring that at least one training observation remained in each row and column. The algorithms were then run on the remaining training portions and evaluated on the test portions of each split. The results, averaged over three train-test splits, are summarized in Table 2.
Hardware Specification	Yes	In order to provide a fair comparison with baseline algorithms, we perform all experiments on an x86-64 architecture using a single 2.60 Ghz core and 30GB of main memory.
Software Dependencies	No	The paper refers to using the Accelerated Proximal Gradient (APG) algorithm of Toh and Yun (2010) and Lin et al. (2009b) as base algorithms but does not specify software implementation versions (e.g., Python, PyTorch, etc.) or specific versions of libraries/solvers.
Experiment Setup	Yes	We use the default parameter settings suggested by Toh and Yun (2010) and Lin et al. (2009b), and measure estimation error via root mean square error (RMSE).