Pattern Alternating Maximization Algorithm for Missing Data in High-Dimensional Problems
Authors: Nicolas Städler, Daniel J. Stekhoven, Peter Bühlmann
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show on simulated and real data that the new method often improves upon other modern imputation techniques such as k-nearest neighbors imputation, nuclear norm minimization or a penalized likelihood approach with an ℓ1-penalty on the concentration matrix. Keywords: missing data, observed likelihood, (partial) Eand M-Step, Lasso, penalized variational free energy. 4. Numerical Experiments In this section we explore the performance of Miss PALasso in recovering missing entries and we report on computational efficiency of the algorithm. |
| Researcher Affiliation | Collaboration | Nicolas Stadler EMAIL The Netherlands Cancer Institute Plesmanlaan 121 1066 CX Amsterdam, The Netherlands Daniel J. Stekhoven EMAIL Quantik AG Bahnhofstrasse 57 8965 Berikon, Switzerland Peter B uhlmann EMAIL Seminar for Statistics, ETH Zurich R amistrasse 101 8092 Zurich, Switzerland |
| Pseudocode | Yes | Algorithm 1: Miss PA ... Algorithm 2: Miss PALasso |
| Open Source Code | No | The paper does not explicitly state that the code for the described methodology is open-source, nor does it provide a link to a code repository. |
| Open Datasets | Yes | 4.1.2 Real Data Examples We consider the following four publicly available data sets: Isoprenoid gene network in Arabidopsis thaliana: ... Wille et al. (2004). Colon cancer: ... Alon et al. (1999). Lymphoma: ... Alizadeh et al. (2000). Yeast cell-cycle: ... Spellman et al. (1998). |
| Dataset Splits | Yes | In each run we generate n = 50 i.i.d. samples from the model. We then delete randomly 5%, 10% and 15% of the values in the data matrix, apply an imputation method and compute the NRMSE. |
| Hardware Specification | No | The paper discusses 'CPU times' but does not specify the type or model of CPU, GPU, or any other specific hardware used for the experiments. |
| Software Dependencies | No | We end this section by illustrating the computational timings of Miss PALasso and Miss GLasso implemented with the statistical computing language R. The paper mentions the use of 'R' but does not provide a specific version number for R or any specific libraries used. |
| Experiment Setup | Yes | In all of our experiments we select the tuning parameters to obtain optimal prediction of the missing entries in terms of NRMSE. ... For a fixed λ we stop the algorithm if the relative change in imputation satisfies, ˆX(r+1) - ˆX(r) 2 / ˆX(r+1) 2 <= 10^-5. |