reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding

Authors: Zhongyi Shui, Jianpeng Zhang, Weiwei Cao, Sinuo Wang, Ruizhe Guo, Le Lu, Lin Yang, Xianghua Ye, Tingbo Liang, Qi Zhang, Ling Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We curated the largest CT dataset to date, comprising imaging and report data from 69,086 patients, and conducted a comprehensive evaluation of 54 major and important disease (including several most deadly cancers) diagnosis tasks across 15 main anatomies. Experimental results demonstrate the substantial potential of f VLM in versatile medical image interpretation. In the zero-shot classiﬁcation task, we achieved an average AUC of 81.3% on 54 diagnosis tasks, surpassing CLIP and supervised methods by 12.9% and 8.0%, respectively. Additionally, on the publicly available CT-RATE and Rad Chest CT benchmarks, our f VLM outperformed the current state-of-the-art methods with absolute AUC gains of 7.4% and 4.8%, respectively.
Researcher Affiliation	Collaboration	1DAMO Academy, Alibaba Group 2The First Afﬁliated Hospital of College of Medicine, Zhejiang University, China 3Zhejiang University, China 4Westlake University, China 5Hupan Lab, 310023, China
Pseudocode	No	The paper describes the methodology using figures and text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/alibaba-damo-academy/fvlm
Open Datasets	Yes	Moreover, on the publicly available CT-RATE and Rad-Chest CT datasets, our f VLM outperforms the state-of-the-art approach by 7.4% and 4.8% absolute AUC value gains, respectively. The details regarding these two datasets can be found in Hamamci et al. (2024) and Draelos et al. (2021).
Dataset Splits	Yes	We randomly split the dataset into training, validation and test sets of 64,476, 1,151, and 3,459 patients, respectively.
Hardware Specification	No	The paper states: 'All experiments are conducted on 8 NVIDIA A100 GPUs' in Appendix A.2, but it does not specify other hardware details such as CPU model, memory, or clock speeds.
Software Dependencies	No	The paper mentions specific models and tools like "vision transformer (ViT)", "BERT", "Totalsegmentator", and "Qwen 2.5", and specifies "AdamW optimizer". However, it does not provide specific version numbers for general software libraries or programming languages (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For Med VL-CT69K, the encoder for f VLM is initialized with an R-50 vision transformer (ViT) pre-trained on Image Net. We train the vision-language model for 100 epochs, using a batch size of 256. The learning rate is initialized to 1e-4 and is decayed by a factor of 0.1 at 60 and 90 epochs. We use the AdamW optimizer with a weight decay of 0.05. For fine-tuning, the learning rate is set to 2e-5.