Matrix Completion with Covariate Information and Informative Missingness
Authors: Huaqing Jin, Yanyuan Ma, Fei Jiang
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The method is demonstrated via simulation studies and is used to analyze a Yelp data set and a Movie Lens data set. |
| Researcher Affiliation | Academia | Huaqing Jin EMAIL Department of Statistics and Actuarial Science The University of Hong Kong, Hong Kong; Yanyuan Ma EMAIL Department of Statistics Pennsylvania State University; Fei Jiang EMAIL Department of Epidemiology and Biostatistics The University of California, San Francisco |
| Pseudocode | No | The paper describes the computational algorithm in Section 4 (Computational Algorithm and Convergence), outlining the steps for updating β and Θ, but it does so in narrative text and mathematical expressions rather than a structured pseudocode block or algorithm environment. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide any links to a code repository. |
| Open Datasets | Yes | The method is demonstrated via simulation studies and is used to analyze a Yelp data set and a Movie Lens data set. The data set is available at https://www.yelp.com/dataset/documentation/main. We compare the performance of our method with that of the WCF method proposed in Hu et al. (2008) on Movie Lens 1M data set, which includes one million ratings from 6040 users and 3952 movies. Movie Lens (https://grouplens.org/datasets/movielens/) |
| Dataset Splits | No | The paper describes a process for introducing additional missingness into the data for evaluation purposes (e.g., 'We first remove the observed Yij s with probabilities p1 and p0 for Yij = 1 and Yij = 0, respectively, where p0 and p1 with p0 = p1 are chosen so that an additional α100% missingness is introduced into the data.') to simulate an MNAR scenario, but it does not provide explicit train/test/validation splits for the overall datasets used to reproduce model training. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We design Θ0 to be a rank 5 matrix with singular values (10, 1.8, 1.6, 1.4, 1.2), and β0 = (1, 0, 2, 0, 3, 4, 5, 0, . . . , 0)T. We set m = n, and vary m, n from 100 to 1600. Furthermore, we generate Rij from Pr(Rij = 1|Yij) = expit(Yij Y D), where Y = P i,j Yij/(mn) and D is chosen to achieve 90% missingness in the data. As specified in Theorems 1 and 2, we select λβ = log{max(p, mn)}/mn and λΘ = CΘ max np log(d)/d, log{max(p, mn)}1/4 d o , where Cβ and CΘ are constants chosen to achieve similar sparseness of the estimators across all situations. In the implementation, we use the Monte Carlo method to approximate the integration in the loss function, while the distribution of Xij is estimated empirically. At the tth iteration, we sample M copies of βt TXij from the empirical distribution, and approximate the integration. |