The Correlation-assisted Missing Data Estimator

Authors: Timothy I. Cannings, Yingying Fan

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also include practical demonstrations throughout the paper using simulated data and the Terneuzen birth cohort and Brandsma datasets available from CRAN.
Researcher Affiliation Academia Timothy I. Cannings EMAIL School of Mathematics University of Edinburgh Edinburgh, UK. Yingying Fan EMAIL Department of Data Sciences and Operations Marshall School of Business University of Southern California Los Angeles, CA 90089, USA
Pseudocode No The paper describes methods mathematically and provides examples, but no explicit pseudocode blocks or algorithm sections are found.
Open Source Code No The paper mentions using existing R packages like `mice`, `ks`, and `regpro` which are available on CRAN. However, it does not state that their own implementation code for the methodologies described in the paper is released or available.
Open Datasets Yes We also include practical demonstrations throughout the paper using simulated data and the Terneuzen birth cohort and Brandsma datasets available from CRAN.
Dataset Splits Yes In order to evaluate the performance of the CAM estimator, we take a subsample of size 1000 from the complete-cases to use as a test set (this is fixed throughout). We carry out 100 experiments. In each one, we form a training set by taking another sample of size 200 from the remaining 2464 complete-cases (this sample is different in each experiment). The 200 chosen complete-cases are then combined with the observations in Am1, Am2 and Am3 (which are the same in every experiment). Thus, in each experiment, we have n0 = 200, nm1 = 302, nm2 = 182, and nm3 = 108.
Hardware Specification No The paper discusses computational cost in general terms, but does not mention specific hardware (e.g., GPU/CPU models, processors, or memory) used for running the experiments.
Software Dependencies No The kernel density estimators are computed using the ks package available from CRAN. In the regression settings, we make use of the regpro package available from CRAN. Our implementation utilises the mice R package available from CRAN (van Buuren et al., 2018). While these packages are mentioned, specific version numbers for them or for R itself are not provided.
Experiment Setup Yes In each case, we generate a training set of size n {200, 500}, and then introduce missingness by removing first component of X independently with probability p1 {0.25, 0.5, 0.75}. The kernel density estimators are computed using the ks package available from CRAN. In particular, we use the kde function with a Gaussian kernel, and the diagonal bandwidth matrices were chosen using the Hpi.diag function. For the Brandsma dataset, we use a Gaussian kernel and the bandwidth was chosen using leave-one-out cross-validation.