reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining

Authors: Fuying Wang, Jiacheng Xu, Lequan Yu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning. Experimental results demonstrate that MELP outperforms existing SSL methods, underscoring its effectiveness and adaptability across diverse clinical applications. Our code is available at https: //github.com/HKU-Med AI/MELP.
Researcher Affiliation	Academia	*Equal contribution 1School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China. Correspondence to: Lequan Yu <EMAIL>.
Pseudocode	No	The paper describes the MELP model and its multi-scale approach with diagrams (Figure 2) and textual explanations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https: //github.com/HKU-Med AI/MELP.
Open Datasets	Yes	We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning. We evaluate our pre-trained MELP across three publicly available benchmarks: PTB-XL (Wagner et al., 2020), CSN (Zheng et al., 2022), CPSC2018 (Liu et al., 2018). For the pretraining stage, we utilize the MIMIC-IV-ECG v1.0 database (Gow et al., 2023).
Dataset Splits	Yes	Training, validation, and test splits adhere to the protocol established by (Wagner et al., 2020). Table 1. Details on the number of samples in each split for each downstream dataset. We conducted linear probing using 1%, 10% and 100% of the training data for each task following (Liu et al., 2024a).
Hardware Specification	Yes	All experiments are conducted on four NVIDIA GTX 3090 GPUs.
Software Dependencies	No	The paper mentions using the Adam W optimizer and the Wav2Vec 2.0 architecture, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	We use the Adam W optimizer with an initial learning rate of 2e-4, a weight decay of 0.2, and a cosine annealing learning rate scheduler. MELP is pretrained for 100 epochs with a per-device batch size of 64. Training is stopped early if the zero-shot prediction performance on the validation sets does not improve for five consecutive epochs. Please refer to our code for more details. All experiments are conducted on four NVIDIA GTX 3090 GPUs. ... We use a batch size of 128 and train for 50 epochs, with early stopping similarly triggered based on the validation AUC.