AoP-SAM: Automation of Prompts for Efficient Segmentation

Authors: Yi Chen, Muyoung Son, Chuanbo Hua, Joo-Young Kim

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations of three datasets demonstrate that Ao P-SAM substantially improves both prompt generation efficiency and mask generation accuracy, making SAM more effective for automated segmentation tasks.
Researcher Affiliation Academia Yi Chen, Muyoung Son, Chuanbo Hua, Joo-Young Kim KAIST, Korea Advanced Institute of Science and Technology, Daejeon, 34141, South Korea EMAIL
Pseudocode No The paper describes the Prompt Predictor Architecture and the Adaptive Sampling and Filtering (ASF) technique in detail through descriptive text, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing the source code for the proposed Ao P-SAM methodology, nor does it provide a link to a code repository.
Open Datasets Yes In this study, we use three key datasets: SA-1B, COCO, and LVIS. The SA-1B dataset, used for training SAM, contains over 1 million images and 1 billion masks (Kirillov et al. 2023). The COCO dataset includes 41,000 images and 200,000 masks, covering a wide range of common objects (Lin et al. 2014). LVIS, designed for long-tail distributions, provides 5,000 images and 25,000 masks, emphasizing fine-grained categories (Gupta, Dollar, and Girshick 2019).
Dataset Splits No Note that Ao P-SAM is trained on a subset dataset of SA 1B and tested on a separate test set, similarly all the comparative methods we employ are also trained and tested on different sets.
Hardware Specification Yes The experiments are conducted using the Py Torch framework on a single Nvidia Titan RTX GPU.
Software Dependencies No The experiments are conducted using the Py Torch framework on a single Nvidia Titan RTX GPU. (Mentions PyTorch but no version number.)
Experiment Setup Yes The training process involved iterative refinement of the model s parameters using a MSELoss function, with optimization carried out via the Adam optimizer. We employed a learning rate of and trained the model for 1000 epochs, using gradient accumulation to handle larger batch sizes effectively. For coarsely sampling point prompts from the Prompt Confidence Map, we first apply a Smoothing Factor=2, a Confidence Intensity Threshold=0.2, and a Prompt Spacing Factor=2 as initialized parameters.