Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

EchoGPT: An Interactive Cardiac Function Assessment Model for Echocardiogram Videos

Authors: Bo Xu, Quanhao Zhu, Qingchen Zhang, Mengmeng Wang, Liang Zhao, Hongfei Lin, Jing Ren, Feng Xia

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate Echo GPT s superior accuracy in predicting LVEF compared to other models, and positive feedback from professional physicians through questionnaire surveys, validating its potential in practical applications.
Researcher Affiliation Academia 1Dalian University of Technology, China 2Hainan University, China 3Zhejiang University of Technology, China 4RMIT University, Australia
Pseudocode Yes Algorithm 1: Forward Propagation of Echo GPT
Open Source Code Yes The demo is available at https://github.com/zhuqh19/ Echo GPT.
Open Datasets Yes Table 1 presents a detailed overview of the publicly available echocardiogram video dataset Echo Net-Dynamic
Dataset Splits Yes Specifically, we selected models such as Echo CLIP, CLIP, Bio Med CLIP, and Pub Med CLIP [Christensen et al., 2024; Radford et al., 2021; Zhang et al., 2024b; Eslami et al., 2023], and applied them to over 1200 test videos from the Echo Net-Dynamic dataset to predict left ventricular ejection fraction.
Hardware Specification Yes All experiments were conducted in a high-performance computing environment equipped with four RTX-4090 GPUs and an Intel(R) Xeon(R) Platinum 8336C CPU, ensuring the sufficiency of computational resources and the reliability of the experimental results.
Software Dependencies Yes the large language model Llama2-7B-Chat
Experiment Setup Yes Regarding training details, both stages maintained a batch size of 2 and utilized the Adam W optimizer with a cosine learning rate scheduler, setting the learning rate to 1e-4. The visual backbone was the Vision Transformer with frozen weights. The linear projection layer was trained from scratch, and the Lo RA (Low-rank Adaptation) method [Hu et al., 2021] was used for efficient fine-tuning of the large language model Llama2-7B-Chat. Specifically, the Wq and Wv components were fine-tuned, with the rank (r) set to 64 and the Lo RA-alpha value equal to 16. The entire model maintained a consistent image resolution of 224 224 pixels throughout all stages to ensure uniformity.