reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating Nuclear-norm Regularized Low-rank Matrix Optimization Through Burer-Monteiro Decomposition

Authors: Ching-pei Lee, Ling Liang, Tianyun Tang, Kim-Chuan Toh

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world large-scale problems of recommendation systems, regularized kernel estimation, and molecular conformation conﬁrm that BM-Global can indeed eﬀectively escapes spurious local minima at which existing BM approaches are stuck, and is a magnitude faster than state-of-the-art algorithms for low-rank matrix optimization problems involving a nuclear-norm regularizer.
Researcher Affiliation	Academia	Ching-pei Lee EMAIL Institute of Statistical Mathematics Ling Liang EMAIL Department of Mathematics University of Maryland at College Park Tianyun Tang EMAIL Institute of Operations Research and Analytics National University of Singapore Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore
Pseudocode	Yes	Algorithm 1: BM-Global Algorithm 2: Lm SVD(Lt, Ut,0, pt) Algorithm 3: The APG method for solving (QSDP)
Open Source Code	Yes	Based on this research, we have released an open-source package of the proposed BM-Global at https://www.github.com/leepei/BM-Global/.
Open Datasets	Yes	movielens100k: https://www.kaggle.com/prajitdatta/movielens-100k-dataset. (We used the split from ua). movielens10m: https://www.kaggle.com/smritisingh1997/movielens-10m-dataset. (We used the split from ra). Netﬂix: https://www.kaggle.com/netflix-inc/netflix-prize-data. Yahoomusc: the R2 one at https://webscope.sandbox.yahoo.com/catalog.php?datatype=r. We consider problems with dissimilarity measures dij for 1 i, j n collected in Duin and Pekalska (2009).4 In our experiments, the data dij are scaled to the interval [0, 1], and the Data available at http://prtools.tudelft.nl/Guide/37Pages/distools.html. In this experiment, we consider the molecules from the Protein Data Bank (see https:// www.rcsb.org/) with given noisy and sparse distance data to simulate distances measurable by nuclear magnetic resonance (NMR) experiments.
Dataset Splits	Yes	For all data sets, We use their original training/test split. movielens100k: https://www.kaggle.com/prajitdatta/movielens-100k-dataset. (We used the split from ua). movielens10m: https://www.kaggle.com/smritisingh1997/movielens-10m-dataset. (We used the split from ra). elements of the index set Ωare randomly selected such that \|Ω\| n/20. select 25% of them to generate our index set Ω.
Hardware Specification	Yes	Experiments on the ﬁrst four data sets are conducted on an Amazon AWS EC2 c6i.4xlarge instance with an 8-core Intel Xeon Ice Lake processor and 32GB memory. For the larger yahoo-music data set, an m6i.4xlarge instance that has the same processor but with 64GB memory is used. Experiments for this part are conducted on a Linux PC with an Intel Xeon E5-2650 processor and 96GB memory.
Software Dependencies	No	All algorithms are implemented in MATLAB and C/C++. We apply the poly MF-SS method of (Wang et al., 2017) in the matrix completion problem and Manopt (Boumal et al., 2014) for manifold optimization in the nonlinear semideﬁnite programming problem. We hence only compare BM-Global with the eﬃcient and robust QSDP solver, QSDPNAL (Li et al., 2018). We borrow most parts of the implementation of Liu et al. (2013) but impose some minor modiﬁcations to adapt for our purposes. First, as we shall see in the following subsection, instead of using a randomly generated initial point Ut,0, we use a more sophisticated initialization scheme. Second, the implementation of Liu et al. (2013) terminates by following a two-level strategy, but in our implementation, we simply terminate the algorithm as long as the diﬀerence between the eigenvalues of Lt,i and Lt,i 1 is small.
Experiment Setup	Yes	For the value of λ on the real-world data, we follow the values provided by Hsieh and Olsen (2014) that were obtained through cross-validation, while the ﬁnal k is the rank of the ﬁnal output of our algorithm, obtained by running our algorithm with the given λ till the objective cannot be further improved. The value of λ for the toy example is from some simple tuning to make the ﬁnal rank not too far away from that of other data sets. we simply assign a ﬁxed step size αt α in our implementation. In particular, we test the setting of alternating between x consecutive inexact proximal gradient steps and y consecutive iterations of the BM phase solver, with x {1, . . . , 5} and y {1, . . . , 8}, for our ﬁxed step variant with α 1.99. More Details and the results are shown in the supplementary materials. Our result indicates that there is no deﬁnite best performer in all cases. But in general, x = 1 and y = 3 seems to be a rather robust choice. We set wij = 1 for all (i, j) Ωand λ = n/10. In our tests, we set τ = 0.1. To measure the accuracy of the estimated positions, we record the root mean square deviation (RMSD): ... Moreover, we let λ = 10 n/ P (i,j) Ωd2 ij.