Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices

Authors: Doudou Zhou, Tianxi Cai, Junwei Lu

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation studies show that BONMI performs well under a variety of configurations. We further illustrate the utility of BONMI by integrating multi-lingual multi-source medical text and EHR data to perform two tasks: (i) co-training semantic embeddings for medical concepts in both English and Chinese and (ii) the translation between English and Chinese medical concepts. Our method shows an advantage over existing methods.
Researcher Affiliation Academia Doudou Zhou EMAIL Department of Biostatistics Harvard T.H. Chan School of Public Health Boston, Massachusetts 02115, USA
Pseudocode Yes Algorithm 1: Block-wise Overlapping Noisy Matrix Integration (BONMI). Algorithm 2: BONMI for asymmetric matrices.
Open Source Code No The paper includes a CC-BY 4.0 license for the publication itself, and attribution requirements, but does not explicitly state that the source code for the methodology described is openly available, nor does it provide a link to a code repository.
Open Datasets Yes The three CUI PPMI matrices are independently derived from three data sources (i) 20 million clinical notes at Stanford (Finlayson et al., 2014); (ii) 10 million notes of 62K patients at Partners Healthcare System (PHS) (Beam et al., 2019); and (iii) health records from MIMIC-III, a freely accessible critical care database (Johnson et al., 2016).
Dataset Splits Yes Finally, we obtain 4201 Chinese-CUI pairs, and we use 2000 pairs as the training set (the known overlapping set) and the other 2201 pairs as the test set to evaluate the translation precision.
Hardware Specification No The paper discusses computational complexity in Remark 5 but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud resources used for running the experiments.
Software Dependencies No The paper mentions software like BERT but does not provide specific version numbers for any libraries or frameworks used in the implementation, which is necessary for reproducibility.
Experiment Setup Yes The default choice for η is set as 0, meaning no shift and setting negative PMI values as 0. Empirically, we find that η = 0 works well. We calculate the eigen decay of the overlapping submatrices of each pair of sources and choose the rank r that makes the cumulative eigenvalue percentage of at least one of the matrices more than 95%, which is 300. We then use r = 300 for all methods.