reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VOILA: Complexity-Aware Universal Segmentation of CT Images by Voxel Interacting with Language

Authors: Zishuo Wan, Yu Gao, Wanyuan Pang, Dawei Ding

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results indicate the proposed VOILA is capable to achieve improved performance with reduced parameters and computational cost during training. Furthermore, it demonstrates significant generalizability across diverse datasets without additional finetuning. Experiments and Results Datasets We conduct the experiments on 7 public CT datasets: Totalsegmentator v2 (Wasserthal et al. 2023) (Ts-v2) BTCV (Beyond the Cranial Vault Segmentation Challenge) Pancreas-CT (Roth et al. 2016) WORD (Luo et al. 2022) Li TS (Bilic et al. 2023) Abdomen CT-1K (Ma et al. 2022) (Ab-1K) AMOS (Ji et al. 2022) ... Table 1: Comparison with 3 SOTA methods on 7 public datasets after 400 training epochs. The results are evaluated with average Dice score.
Researcher Affiliation	Academia	Zishuo Wan1, Yu Gao1, Wanyuan Pang1, Dawei Ding1, 2* 1School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China 2Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China EMAIL, EMAIL
Pseudocode	No	The paper describes methods and processes in detail, such as the Voxel-Language Interaction and Complexity-Aware Sampling, but does not present any structured pseudocode or algorithm blocks. The steps are explained in paragraph form.
Open Source Code	Yes	Code https://github.com/Zishuo Wan/VOILA
Open Datasets	Yes	Datasets We conduct the experiments on 7 public CT datasets: Totalsegmentator v2 (Wasserthal et al. 2023) (Ts-v2) BTCV (Beyond the Cranial Vault Segmentation Challenge) Pancreas-CT (Roth et al. 2016) WORD (Luo et al. 2022) Li TS (Bilic et al. 2023) Abdomen CT-1K (Ma et al. 2022) (Ab-1K) AMOS (Ji et al. 2022)
Dataset Splits	Yes	All datasets were randomly split into training and testing sets with a 1:1 ratio, except for the Totalsegmentator dataset, which used the official split.
Hardware Specification	Yes	All experiments were conducted using the Py Torch platform and trained/tested on 8 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	All experiments were conducted using the Py Torch platform and trained/tested on 8 NVIDIA Ge Force RTX 3090 GPUs. Parameters from pre-trained CLIP text encoder were frozen. The paper mentions Py Torch and CLIP but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	All images were pre-processed by: resampling to a spacing of (1.5, 1.5, 1.5), crop to non-zero area and Z-score normalization. The networks were trained 400 epochs using the Adam optimizer with a learning rate of 1 10 3. Parameters from pre-trained CLIP text encoder were frozen. Data augmentation was applied including randomly flipping, rotating, zooming, intensity adjusting, and patch crop with size of 128 128 128. As a result, sliding window prediction was performed during inference. The dimension of voxel and text tokens were first reduced to 32 before interaction. In terms of CAS module, we sampled 10% voxels during training, with the oversampling ratio n = 2.