Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners
Authors: Manh Pham Hung, Aaqib Saeed, Dong Ma
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five public datasets across diverse downstream tasks demonstrate that D-BETA significantly outperforms existing methods, achieving an average AUC improvement of 15% in linear probing with only one percent of training data and 2% in zeroshot performance without requiring training data over state-of-the-art models. These results highlight the effectiveness of D-BETA, underscoring its potential to advance automated clinical diagnostics through multi-modal representations. |
| Researcher Affiliation | Academia | 1Singapore Management University 2Eindhoven University of Technology. Correspondence to: Dong Ma <EMAIL>. |
| Pseudocode | No | The paper describes the methodology and architecture in detail using textual descriptions and a block diagram (Figure 1), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code and checkpoint are made available at https:// github.com/manhph2211/D-BETA. |
| Open Datasets | Yes | In the pre-training stage, we utilize the MIMIC-IV-ECG v1.0 database (Gow et al., 2023), which includes 800,035 paired samples derived from 161,352 unique subjects. ... We evaluate our pre-trained encoders on five widely-used public datasets: Physio Net 2021 (Reyna et al., 2021), PTB-XL (Wagner et al., 2020), CSN (Zheng et al., 2022), CPSC2018 (Liu et al., 2018), and CODE-test (Ribeiro et al., 2020). |
| Dataset Splits | Yes | We follow (Liu et al., 2024b) to split this dataset, including four sub-groups (super, sub, form, and rhythm). We consider them as the four separated datasets and prepare each of them with the same train, val, and test set as in the original paper (Wagner et al., 2020). ... For CSN. This dataset consists of 23,026 ECG recordings sampled at 500 Hz for 10 seconds with 38 distinct labels, which also supports the evaluation in a classification task. We use 70%:10%:20% data split as processed in (Liu et al., 2024b). ... Table 9. Details on data configurations on five evaluated datasets. Here, LP, ZS are linear probing and zero-shot respectively, while FFT means full fine-tuning. |
| Hardware Specification | Yes | The quantitative experiments are conducted on a single NVIDIA H100-80GB GPU. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer, Flan-T5 model, Flan-T5 tokenizer, FAISS library, and GPT-4o. However, it does not provide specific version numbers for these software components or the underlying frameworks (e.g., PyTorch, TensorFlow) used for implementation. |
| Experiment Setup | Yes | For model training, we use the Adam optimizer with a learning rate of 5e-5 and use a tri-stage scheduler with ratios of 0.1, 0.4, and 0.5 for learning rate adjustments. The optimizer is configured with β1 = 0.9, β2 = 0.98, an epsilon value of 1e-6, and a weight decay of 0.01. We pre-train the proposed model for 300000 steps, maintaining a batch size of 128. ... Table 10. Details on training configurations on the fine-tuned datasets. For the optimizer, we keep using Adam in all experiments. |