On Low-rank Trace Regression under General Sampling Distribution

Authors: Nima Hamidi, Mohsen Bayati

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, using simulations on synthetic and real data, we show that the cross-validated estimator selects a near-optimal penalty parameter and outperforms the theory-inspired approach of selecting the parameter. Keywords: Matrix Completion, Multi-task Learning, Compressed Sensing, Low-rank Matrices, Cross-validation. 4.1 Simulations In this section, we test the empirical performance of the cross-validated estimator, using synthetic and real data. For the synthetic data setting, we generate a d d matrix B of rank r. Following a similar approach as in (Keshavan et al., 2010), we first generate d r matrices B L and B R with independent standard normal entries and then set B := B L B R . For the real data setting, we use movie ratings from the Movie Lens3 data set, from which we take the d most frequently watched movies and the d users that rated the most movies.
Researcher Affiliation Academia Nima Hamidi EMAIL Department of Statistics Stanford University Stanford, CA 94305, USA Mohsen Bayati EMAIL Graduate School of Business Stanford University Stanford, CA 94305, USA
Pseudocode No The paper describes algorithms and estimation approaches (e.g., in Section 3.2 Estimation, it mentions convex programs and alternating minimization approaches), but it does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes All codes are available in this repository https://github.com/mohsenbayati/cv-impute.
Open Datasets Yes For the real data setting, we use movie ratings from the Movie Lens3 data set, from which we take the d most frequently watched movies and the d users that rated the most movies. 3. The file ratings.csv from https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset.
Dataset Splits Yes We use 10 folds (i.e., K = 10) and use a set of regularization parameters Λ = {λ}i=1 [L ] constructed exactly like the ones in the oracle estimator; however, since cv does not have access to λ0, L is the smallest integer with λL 0.001λmax .
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers used for the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In this section, we test the empirical performance of the cross-validated estimator, using synthetic and real data. For the synthetic data setting, we generate a d d matrix B of rank r. ... For the distribution of observations, Π, we consider the matrix completion case. Specifically, for each i [n], ri and ci are integers in [d], selected independently and uniformly at random. Then, Xi = erie ci. This leads to n observations Yi = B rici + εi, where εi are taken to be i.i.d. standard normal random variables. Given these observations, we compare the estimation error of the following five different estimators: 1. theory-1, theory-2, and theory-3 estimators solve the convex program Eq. (1.2) for a given value of λ = λ0 that is motivated by the theoretical results. Specifically, by Remark 3.3, we need λ0 3 Σ op to hold with high probability, which means that we select λ0 so that λ0 3 Σ op holds with probability 0.9. For each sample of size n, we find λ0 by generating 1000 independent data sets of the same size and then, for the theory-3 estimator, we choose the 100th biggest value of 3 Σ op. ... 3. The cv estimator is introduced at the beginning of this section. We use 10 folds (i.e., K = 10) and use a set of regularization parameters Λ = {λ}i=1 [L ] constructed exactly like the ones in the oracle estimator; however, since cv does not have access to λ0, L is the smallest integer with λL 0.001λmax .