Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
EchoGPT: An Interactive Cardiac Function Assessment Model for Echocardiogram Videos
Authors: Bo Xu, Quanhao Zhu, Qingchen Zhang, Mengmeng Wang, Liang Zhao, Hongfei Lin, Jing Ren, Feng Xia
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate Echo GPT s superior accuracy in predicting LVEF compared to other models, and positive feedback from professional physicians through questionnaire surveys, validating its potential in practical applications. |
| Researcher Affiliation | Academia | 1Dalian University of Technology, China 2Hainan University, China 3Zhejiang University of Technology, China 4RMIT University, Australia |
| Pseudocode | Yes | Algorithm 1: Forward Propagation of Echo GPT |
| Open Source Code | Yes | The demo is available at https://github.com/zhuqh19/ Echo GPT. |
| Open Datasets | Yes | Table 1 presents a detailed overview of the publicly available echocardiogram video dataset Echo Net-Dynamic |
| Dataset Splits | Yes | Specifically, we selected models such as Echo CLIP, CLIP, Bio Med CLIP, and Pub Med CLIP [Christensen et al., 2024; Radford et al., 2021; Zhang et al., 2024b; Eslami et al., 2023], and applied them to over 1200 test videos from the Echo Net-Dynamic dataset to predict left ventricular ejection fraction. |
| Hardware Specification | Yes | All experiments were conducted in a high-performance computing environment equipped with four RTX-4090 GPUs and an Intel(R) Xeon(R) Platinum 8336C CPU, ensuring the sufficiency of computational resources and the reliability of the experimental results. |
| Software Dependencies | Yes | the large language model Llama2-7B-Chat |
| Experiment Setup | Yes | Regarding training details, both stages maintained a batch size of 2 and utilized the Adam W optimizer with a cosine learning rate scheduler, setting the learning rate to 1e-4. The visual backbone was the Vision Transformer with frozen weights. The linear projection layer was trained from scratch, and the Lo RA (Low-rank Adaptation) method [Hu et al., 2021] was used for efficient fine-tuning of the large language model Llama2-7B-Chat. Specifically, the Wq and Wv components were fine-tuned, with the rank (r) set to 64 and the Lo RA-alpha value equal to 16. The entire model maintained a consistent image resolution of 224 224 pixels throughout all stages to ensure uniformity. |