MTSAM: Multi-Task Fine-Tuning for Segment Anything Model

Authors: Xuehao Wang, Zhan ZHUANG, Feiyang YE, Yu Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on benchmark datasets substantiate the efficacy of MTSAM in enhancing the performance of multi-task learning. Our code is available at https://github.com/Xuehao Wang Fi/MTSAM. We conduct comprehensive experiments on benchmark datasets, demonstrating the exceptional performance of the MTSAM framework. In this section, we empirically evaluate the proposed MTSAM on three benchmark datasets, including NYUv2 (Silberman et al., 2012), City Scapes (Cordts et al., 2016), and PASCAL-Context (Everingham et al., 2010).
Researcher Affiliation Academia 1Southern University of Science and Technology 2City University of Hong Kong 3University of Technology Sydney EMAIL EMAIL
Pseudocode No The paper describes methods and formulas (e.g., Eq. (6), Eq. (7), Eq. (8), Eq. (9), Eq. (10)), and provides an overview of the architecture in figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Xuehao Wang Fi/MTSAM.
Open Datasets Yes In this section, we empirically evaluate the proposed MTSAM on three benchmark datasets, including NYUv2 (Silberman et al., 2012), City Scapes (Cordts et al., 2016), and PASCAL-Context (Everingham et al., 2010).
Dataset Splits No The paper evaluates on NYUv2, City Scapes, and PASCAL-Context datasets but does not explicitly detail the training, validation, and test splits used. It does not mention specific percentages, sample counts, or explicit references to predefined splits for reproduction.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the Adam optimizer and specific loss functions but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes The batch size is set to 4 for NYUv2 and 8 for City Scapes and PASCALContext. The cross-entropy loss, L1 loss, and cosine similarity loss are used as the loss functions of the semantic segmentation, depth estimation, and surface normal prediction tasks, respectively. The Adam optimizer is used to update fine-tuned parameters. In the Adam optimizer, an initial learning rate is set to 10^-3, the linear learning rate scheduler with warmup is adopted while the warmup rate is set to 0.05, and the weight decay is set to 10^-6. The dropout rate is set to 0.1. For the proposed To RA, we set p = q = 32, v = 8 on the NYUv2 and PASCAL-Context datasets, and p = q = 16, v = 4 on the City Scapes dataset. The hyper-parameter λ is set to 1. The total number of finetuned epochs is set to 200, 50, and 30 for the NYUv2, City Scapes, and PASCAL-Context datasets, respectively. For City Scapes and PASCAL-Context datasets, we use equal weights for each task (i.e., wi equals 1 in Eq. (8)), while for the NYUv2 dataset, we follow (Lopes et al., 2023) to set the weights of semantic segmentation, depth estimation, and surface normal prediction tasks to be 1, 1, and 4, respectively.