reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices

Authors: Doudou Zhou, Tianxi Cai, Junwei Lu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation studies show that BONMI performs well under a variety of conﬁgurations. We further illustrate the utility of BONMI by integrating multi-lingual multi-source medical text and EHR data to perform two tasks: (i) co-training semantic embeddings for medical concepts in both English and Chinese and (ii) the translation between English and Chinese medical concepts. Our method shows an advantage over existing methods.
Researcher Affiliation	Academia	Doudou Zhou EMAIL Department of Biostatistics Harvard T.H. Chan School of Public Health Boston, Massachusetts 02115, USA
Pseudocode	Yes	Algorithm 1: Block-wise Overlapping Noisy Matrix Integration (BONMI). Algorithm 2: BONMI for asymmetric matrices.
Open Source Code	No	The paper includes a CC-BY 4.0 license for the publication itself, and attribution requirements, but does not explicitly state that the source code for the methodology described is openly available, nor does it provide a link to a code repository.
Open Datasets	Yes	The three CUI PPMI matrices are independently derived from three data sources (i) 20 million clinical notes at Stanford (Finlayson et al., 2014); (ii) 10 million notes of 62K patients at Partners Healthcare System (PHS) (Beam et al., 2019); and (iii) health records from MIMIC-III, a freely accessible critical care database (Johnson et al., 2016).
Dataset Splits	Yes	Finally, we obtain 4201 Chinese-CUI pairs, and we use 2000 pairs as the training set (the known overlapping set) and the other 2201 pairs as the test set to evaluate the translation precision.
Hardware Specification	No	The paper discusses computational complexity in Remark 5 but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud resources used for running the experiments.
Software Dependencies	No	The paper mentions software like BERT but does not provide specific version numbers for any libraries or frameworks used in the implementation, which is necessary for reproducibility.
Experiment Setup	Yes	The default choice for η is set as 0, meaning no shift and setting negative PMI values as 0. Empirically, we ﬁnd that η = 0 works well. We calculate the eigen decay of the overlapping submatrices of each pair of sources and choose the rank r that makes the cumulative eigenvalue percentage of at least one of the matrices more than 95%, which is 300. We then use r = 300 for all methods.