Part-aware Prompted Segment Anything Model for Adaptive Segmentation
Authors: Chenhui Zhao, Liyue Shen
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | P2SAM improves the performance by +8.0% and +2.0% mean Dice score for two different patient-adaptive segmentation applications, respectively. In addition, P2SAM also exhibits impressive generalizability in other adaptive segmentation tasks in the natural image domain, e.g., +6.4% m Io U within personalized object segmentation task. The code is available at https://github.com/Zch0414/p2sam |
| Researcher Affiliation | Academia | Chenhui Zhao EMAIL Department of Computer Science and Engineering, University of Michigan Liyue Shen EMAIL Department of Electrical and Computer Engineering, University of Michigan |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Zch0414/p2sam |
| Open Datasets | Yes | We utilize a total of four medical datasets, including two internal datasets: The NSCLC-Radiomics dataset (Aerts et al., 2015), collected for non-small cell lung cancer (NSCLC) segmentation, contains data from 422 patients. Each patient has a computed tomography (CT) volume along with corresponding segmentation annotations. The Kvasir-SEG dataset (Jha et al., 2020), contains 1000 labeled endoscopy polyp images. Two external datasets from different institutions: The 4D-Lung dataset (Hugo et al., 2016), collected for longitudinal analysis, contains data from 20 patients, within which 13 patients underwent multiple visits, 3 to 8 visits for each patient. For each visit, a CT volume along with corresponding segmentation labels is available. The CVC-Clinic DB dataset (Bernal et al., 2015), contains 612 labeled polyp images selected from 29 endoscopy videos. |
| Dataset Splits | Yes | Each dataset was randomly split into three subsets: training, validation, and testing, with an 80:10:10 percent ratio (patient-wise splitting for the NSCLC-Radiomics dataset to prevent data leak). |
| Hardware Specification | Yes | All experiments are conducted on A40 GPUs. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and SAM as a backbone model but does not specify version numbers for any key software components like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We fine-tune the model for 36 epochs on the NSCLC-Radiomics dataset and 100 epochs on the Kvasir-SEG dataset with a batch size of 4. The initial learning rate is 1e-4, and the fine-tuning process is guided by cosine learning rate decay, with a linear learning rate warm-up over the first 10 percent epochs. We optimize the model by Adam W optimizer (Loshchilov & Hutter, 2017) (β1=0.9, β2=0.999), with a weight decay of 0.05. We further penalize the SAM s encoder with a drop path of 0.1. |