VOILA: Complexity-Aware Universal Segmentation of CT Images by Voxel Interacting with Language
Authors: Zishuo Wan, Yu Gao, Wanyuan Pang, Dawei Ding
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results indicate the proposed VOILA is capable to achieve improved performance with reduced parameters and computational cost during training. Furthermore, it demonstrates significant generalizability across diverse datasets without additional finetuning. Experiments and Results Datasets We conduct the experiments on 7 public CT datasets: Totalsegmentator v2 (Wasserthal et al. 2023) (Ts-v2) BTCV (Beyond the Cranial Vault Segmentation Challenge) Pancreas-CT (Roth et al. 2016) WORD (Luo et al. 2022) Li TS (Bilic et al. 2023) Abdomen CT-1K (Ma et al. 2022) (Ab-1K) AMOS (Ji et al. 2022) ... Table 1: Comparison with 3 SOTA methods on 7 public datasets after 400 training epochs. The results are evaluated with average Dice score. |
| Researcher Affiliation | Academia | Zishuo Wan1, Yu Gao1, Wanyuan Pang1, Dawei Ding1, 2* 1School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China 2Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and processes in detail, such as the Voxel-Language Interaction and Complexity-Aware Sampling, but does not present any structured pseudocode or algorithm blocks. The steps are explained in paragraph form. |
| Open Source Code | Yes | Code https://github.com/Zishuo Wan/VOILA |
| Open Datasets | Yes | Datasets We conduct the experiments on 7 public CT datasets: Totalsegmentator v2 (Wasserthal et al. 2023) (Ts-v2) BTCV (Beyond the Cranial Vault Segmentation Challenge) Pancreas-CT (Roth et al. 2016) WORD (Luo et al. 2022) Li TS (Bilic et al. 2023) Abdomen CT-1K (Ma et al. 2022) (Ab-1K) AMOS (Ji et al. 2022) |
| Dataset Splits | Yes | All datasets were randomly split into training and testing sets with a 1:1 ratio, except for the Totalsegmentator dataset, which used the official split. |
| Hardware Specification | Yes | All experiments were conducted using the Py Torch platform and trained/tested on 8 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | All experiments were conducted using the Py Torch platform and trained/tested on 8 NVIDIA Ge Force RTX 3090 GPUs. Parameters from pre-trained CLIP text encoder were frozen. The paper mentions Py Torch and CLIP but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | All images were pre-processed by: resampling to a spacing of (1.5, 1.5, 1.5), crop to non-zero area and Z-score normalization. The networks were trained 400 epochs using the Adam optimizer with a learning rate of 1 10 3. Parameters from pre-trained CLIP text encoder were frozen. Data augmentation was applied including randomly flipping, rotating, zooming, intensity adjusting, and patch crop with size of 128 128 128. As a result, sliding window prediction was performed during inference. The dimension of voxel and text tokens were first reduced to 32 before interaction. In terms of CAS module, we sampled 10% voxels during training, with the oversampling ratio n = 2. |