TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Authors: Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our Tiny SAM against counterpart methods. Experiments Implementation Details We utilize the Tiny Vi T-5M (Wu et al. 2022) as the lightweight student image encoder and SAM-H as the teacher model, following prior work (Zhang et al. 2023). 1% of SA-1B dataset is used as the training data for fullstage distillation. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2Huawei Noah s Ark Lab EMAIL |
| Pseudocode | No | The paper describes methods using equations and figures, but does not contain a clearly labeled pseudocode or algorithm block with structured steps. |
| Open Source Code | Yes | Code https://github.com/xinghaochen/Tiny SAM |
| Open Datasets | Yes | Together with the proposed SA-1B dataset, which contains 11 million high-resolution images and more than 1 billion high-quality segmentation masks, SAM shows impressive high quality segmentation ability for objects of any category and shape. We evaluate the zero-shot instance segmentation task for models on the benchmark of COCO (Lin et al. 2014) dataset and LVIS v1 (Gupta, Dollar, and Girshick 2019). We choose a subset of total 23 datasets used in (Kirillov et al. 2023) for efficient evaluation, which contains BBBC038v1 (Caicedo et al. 2019), DOORS (Pugliatti and Topputo 2022), Timber Seg (Fortin et al. 2022) and LVIS (Gupta, Dollar, and Girshick 2019). |
| Dataset Splits | Yes | 1% of SA-1B dataset is used as the training data for fullstage distillation. We evaluate the zero-shot instance segmentation task for models on the benchmark of COCO (Lin et al. 2014) dataset and LVIS v1 (Gupta, Dollar, and Girshick 2019). To make fair comparison, we follow the settings of SAM (Kirillov et al. 2023) to sample the images and masks, and the first N masks in the corresponding split are used in the evaluation. Evaluation on the first 100 images of COCO val2017 set. |
| Hardware Specification | Yes | The latency is tested with Tensor RT on NVIDIA T4 GPU. The latency is tested on NVIDIA T4 GPU. Latency benchmarks are conducted on a single NVIDIA V100 GPU for everything mode. |
| Software Dependencies | No | The paper mentions 'Tensor RT' but does not specify a version number. Other software like 'Adam optimizer' is mentioned but without version details for the software library or framework. |
| Experiment Setup | Yes | We utilize the Tiny Vi T-5M (Wu et al. 2022) as the lightweight student image encoder and SAM-H as the teacher model, following prior work (Zhang et al. 2023). 1% of SA-1B dataset is used as the training data for fullstage distillation. We adopt Adam optimizer and train the student network for 8 epochs. For each iteration, we sample 64 prompts according to hard prompt sampling strategy. For post training quantization, we set θl = 0.01, θu = 1.2, n = 100, rounds = 3 for iterative search. We calibrate quantized model on SA-1B dataset using 8 images. The threshold values used in the everything mode are all kept the same as default. |