reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ECG Representation Learning with Multi-Modal EHR Data

Authors: Sravan Kumar Lalam, Hari Krishna Kunderu, Shayan Ghosh, Harish Kumar A, Samir Awasthi, Ashim Prasad, Francisco Lopez-Jimenez, Zachi I Attia, Samuel Asirvatham, Paul Friedman, Rakesh Barve, Melwin Babu

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We pre-train the models on a large proprietary dataset of about 9 million ECGs from around 2.4 million patients and evaluate the pre-trained models on various downstream tasks such as classification, zero-shot retrieval, and out-of-distribution detection involving the prediction of various heart conditions using ECG waveforms as input, and demonstrate that the models presented in this work show significant improvements compared to all baseline modes.
Researcher Affiliation	Collaboration	1 Nference Inc. 2 Anumana Inc. 3 Mayo Clinic, USA BCorresponding authors: {EMAIL, EMAIL}
Pseudocode	No	The paper describes methods using mathematical equations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository.
Open Datasets	Yes	We also evaluate all pre-trained models on two publicly available datasets: (i) Physio Net2020 (Alday et al., 2020), which consists of a collection of six 12-lead ECG datasets with varying signal lengths and sampling rates, (ii) Chapman (Zheng et al., 2020), which contains 10-second long 12-lead ECGs (see Appendix A.3 for more details).
Dataset Splits	Yes	We initially split all the patients into the global train, validation, and test sets in a 60%, 5%, and 35% ratio, which are then used to create pre-training datasets and disease cohorts for downstream classification tasks. In particular train, validation, and test sets for pre-training and classification tasks are created by drawing the EHRs from the global train, validation, and test patients respectively. This approach ensures that we can effectively evaluate the quality of representations on downstream tasks, as the data of validation and test patients is not seen during the pre-training phase. Consequently, all datasets across tasks have train, validation, and test split percentages roughly close to 60%, 5%, and 35% respectively. [...] To replicate the state-of-the-art results on the Physio Net2020 dataset presented by 3KG (Gopal et al., 2021), we followed their detailed procedure: (...) (iv) split the dataset into 80%, 10%, and 10% for training, validation, and testing respectively; (...) For the Chapman dataset, in line with Kiyasseh et al. (2021), (...) split the dataset into 60%, 20%, and 20% for training, validation, and testing respectively.
Hardware Specification	Yes	We execute all pre-training and classification tasks using 2 Nvidia V100 (16G) GPUs. However, for the pretraining tasks involving the text domain, we utilize 2 Nvidia A100 (40G) GPUs.
Software Dependencies	No	The paper mentions software components like the Huggingface transformers library, BERT, and Adam W optimizer, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We used a custom BERT model with the number of layers, hidden size, and number of self-attention heads set to 5, 320, and 5 respectively. This model has 15M parameters. We initialize the model weights randomly and follow the BERT (Devlin et al., 2018) pretraining strategy, i.e., Masked Language Modeling (MLM) to learn the representations of the structured EHR sequences. We minimize the MLM loss given by L = 1 K PK i=1 log p (Dmi\|D M; Θ), where Θ are parameters of the model, D = {D0, D1, ..., DN} is the sequence of medical codes of length N, M = {m0, m1, ..., m K} are indices of masked medical codes, and D M denotes the set of unmasked medical codes. During training, the medical codes are masked with a probability of 15%, and the model is trained with Adam W (Loshchilov & Hutter, 2019) optimizer and batch size of 512 for 100 epochs. We set an initial learning rate of 5e-4 and the learning rate is reduced by a factor of 2 if the validation loss stops decreasing continuously for 2 epochs. (...) Following Zhang et al. (2022), we set τ to 0.1. We assign equal weighting to both the directions of contrastive learning, i.e., from ECG to s EHR and s EHR to ECG, similarly for ECG and text, i.e., (λes, λet) is set to (0.5, 0.5). We used a batch size of 256 and an initial learning rate of 1e-4 for our models. For ECG-only contrastive learning models, we used a batch size of 512 and an initial learning rate of 1e-3. The learning rate is reduced by a factor of 2 if the validation loss stops decreasing continuously for 2 epochs and we early stop the training based on validation loss with an early stopping patience of 10 epochs. (...) For classification tasks, we add a two-layered MLP head on top of the ECG encoder. We also add dropout layers after each hidden layer with a dropout probability of 0.2 for regularisation. A batch size of 128 is used for all classification models. We used an initial learning rate of 1e-3 for random initialization training for all diseases. For fine-tuning, we used an initial learning rate of 1e-3 for coronary atherosclerosis and myocarditis tasks, and 1e-4 for cardiac amyloidosis, pulmonary hypertension, low LVEF, and AFib in NSR tasks. The learning rate is reduced by a factor of 2 if the validation score stops increasing continuously for 2 epochs and we early stop the training based on validation loss with an early stopping patience of 10 epochs.