Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices
Authors: Doudou Zhou, Tianxi Cai, Junwei Lu
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation studies show that BONMI performs well under a variety of configurations. We further illustrate the utility of BONMI by integrating multi-lingual multi-source medical text and EHR data to perform two tasks: (i) co-training semantic embeddings for medical concepts in both English and Chinese and (ii) the translation between English and Chinese medical concepts. Our method shows an advantage over existing methods. |
| Researcher Affiliation | Academia | Doudou Zhou EMAIL Department of Biostatistics Harvard T.H. Chan School of Public Health Boston, Massachusetts 02115, USA |
| Pseudocode | Yes | Algorithm 1: Block-wise Overlapping Noisy Matrix Integration (BONMI). Algorithm 2: BONMI for asymmetric matrices. |
| Open Source Code | No | The paper includes a CC-BY 4.0 license for the publication itself, and attribution requirements, but does not explicitly state that the source code for the methodology described is openly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The three CUI PPMI matrices are independently derived from three data sources (i) 20 million clinical notes at Stanford (Finlayson et al., 2014); (ii) 10 million notes of 62K patients at Partners Healthcare System (PHS) (Beam et al., 2019); and (iii) health records from MIMIC-III, a freely accessible critical care database (Johnson et al., 2016). |
| Dataset Splits | Yes | Finally, we obtain 4201 Chinese-CUI pairs, and we use 2000 pairs as the training set (the known overlapping set) and the other 2201 pairs as the test set to evaluate the translation precision. |
| Hardware Specification | No | The paper discusses computational complexity in Remark 5 but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud resources used for running the experiments. |
| Software Dependencies | No | The paper mentions software like BERT but does not provide specific version numbers for any libraries or frameworks used in the implementation, which is necessary for reproducibility. |
| Experiment Setup | Yes | The default choice for η is set as 0, meaning no shift and setting negative PMI values as 0. Empirically, we find that η = 0 works well. We calculate the eigen decay of the overlapping submatrices of each pair of sources and choose the rank r that makes the cumulative eigenvalue percentage of at least one of the matrices more than 95%, which is 300. We then use r = 300 for all methods. |