Part-aware Prompted Segment Anything Model for Adaptive Segmentation

Authors: Chenhui Zhao, Liyue Shen

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental P2SAM improves the performance by +8.0% and +2.0% mean Dice score for two different patient-adaptive segmentation applications, respectively. In addition, P2SAM also exhibits impressive generalizability in other adaptive segmentation tasks in the natural image domain, e.g., +6.4% m Io U within personalized object segmentation task. The code is available at https://github.com/Zch0414/p2sam
Researcher Affiliation Academia Chenhui Zhao EMAIL Department of Computer Science and Engineering, University of Michigan Liyue Shen EMAIL Department of Electrical and Computer Engineering, University of Michigan
Pseudocode No The paper describes the methodology in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Zch0414/p2sam
Open Datasets Yes We utilize a total of four medical datasets, including two internal datasets: The NSCLC-Radiomics dataset (Aerts et al., 2015), collected for non-small cell lung cancer (NSCLC) segmentation, contains data from 422 patients. Each patient has a computed tomography (CT) volume along with corresponding segmentation annotations. The Kvasir-SEG dataset (Jha et al., 2020), contains 1000 labeled endoscopy polyp images. Two external datasets from different institutions: The 4D-Lung dataset (Hugo et al., 2016), collected for longitudinal analysis, contains data from 20 patients, within which 13 patients underwent multiple visits, 3 to 8 visits for each patient. For each visit, a CT volume along with corresponding segmentation labels is available. The CVC-Clinic DB dataset (Bernal et al., 2015), contains 612 labeled polyp images selected from 29 endoscopy videos.
Dataset Splits Yes Each dataset was randomly split into three subsets: training, validation, and testing, with an 80:10:10 percent ratio (patient-wise splitting for the NSCLC-Radiomics dataset to prevent data leak).
Hardware Specification Yes All experiments are conducted on A40 GPUs.
Software Dependencies No The paper mentions using the Adam W optimizer and SAM as a backbone model but does not specify version numbers for any key software components like programming languages, libraries, or frameworks.
Experiment Setup Yes We fine-tune the model for 36 epochs on the NSCLC-Radiomics dataset and 100 epochs on the Kvasir-SEG dataset with a batch size of 4. The initial learning rate is 1e-4, and the fine-tuning process is guided by cosine learning rate decay, with a linear learning rate warm-up over the first 10 percent epochs. We optimize the model by Adam W optimizer (Loshchilov & Hutter, 2017) (β1=0.9, β2=0.999), with a weight decay of 0.05. We further penalize the SAM s encoder with a drop path of 0.1.