reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners

Authors: Manh Pham Hung, Aaqib Saeed, Dong Ma

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on five public datasets across diverse downstream tasks demonstrate that D-BETA significantly outperforms existing methods, achieving an average AUC improvement of 15% in linear probing with only one percent of training data and 2% in zeroshot performance without requiring training data over state-of-the-art models. These results highlight the effectiveness of D-BETA, underscoring its potential to advance automated clinical diagnostics through multi-modal representations.
Researcher Affiliation	Academia	1Singapore Management University 2Eindhoven University of Technology. Correspondence to: Dong Ma <EMAIL>.
Pseudocode	No	The paper describes the methodology and architecture in detail using textual descriptions and a block diagram (Figure 1), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Our code and checkpoint are made available at https:// github.com/manhph2211/D-BETA.
Open Datasets	Yes	In the pre-training stage, we utilize the MIMIC-IV-ECG v1.0 database (Gow et al., 2023), which includes 800,035 paired samples derived from 161,352 unique subjects. ... We evaluate our pre-trained encoders on five widely-used public datasets: Physio Net 2021 (Reyna et al., 2021), PTB-XL (Wagner et al., 2020), CSN (Zheng et al., 2022), CPSC2018 (Liu et al., 2018), and CODE-test (Ribeiro et al., 2020).
Dataset Splits	Yes	We follow (Liu et al., 2024b) to split this dataset, including four sub-groups (super, sub, form, and rhythm). We consider them as the four separated datasets and prepare each of them with the same train, val, and test set as in the original paper (Wagner et al., 2020). ... For CSN. This dataset consists of 23,026 ECG recordings sampled at 500 Hz for 10 seconds with 38 distinct labels, which also supports the evaluation in a classification task. We use 70%:10%:20% data split as processed in (Liu et al., 2024b). ... Table 9. Details on data configurations on five evaluated datasets. Here, LP, ZS are linear probing and zero-shot respectively, while FFT means full fine-tuning.
Hardware Specification	Yes	The quantitative experiments are conducted on a single NVIDIA H100-80GB GPU.
Software Dependencies	No	The paper mentions the use of the Adam optimizer, Flan-T5 model, Flan-T5 tokenizer, FAISS library, and GPT-4o. However, it does not provide specific version numbers for these software components or the underlying frameworks (e.g., PyTorch, TensorFlow) used for implementation.
Experiment Setup	Yes	For model training, we use the Adam optimizer with a learning rate of 5e-5 and use a tri-stage scheduler with ratios of 0.1, 0.4, and 0.5 for learning rate adjustments. The optimizer is configured with β1 = 0.9, β2 = 0.98, an epsilon value of 1e-6, and a weight decay of 0.01. We pre-train the proposed model for 300000 steps, maintaining a batch size of 128. ... Table 10. Details on training configurations on the fine-tuned datasets. For the optimizer, we keep using Adam in all experiments.