reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MTSAM: Multi-Task Fine-Tuning for Segment Anything Model

Authors: Xuehao Wang, Zhan ZHUANG, Feiyang YE, Yu Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on benchmark datasets substantiate the efficacy of MTSAM in enhancing the performance of multi-task learning. Our code is available at https://github.com/Xuehao Wang Fi/MTSAM. We conduct comprehensive experiments on benchmark datasets, demonstrating the exceptional performance of the MTSAM framework. In this section, we empirically evaluate the proposed MTSAM on three benchmark datasets, including NYUv2 (Silberman et al., 2012), City Scapes (Cordts et al., 2016), and PASCAL-Context (Everingham et al., 2010).
Researcher Affiliation	Academia	1Southern University of Science and Technology 2City University of Hong Kong 3University of Technology Sydney EMAIL EMAIL
Pseudocode	No	The paper describes methods and formulas (e.g., Eq. (6), Eq. (7), Eq. (8), Eq. (9), Eq. (10)), and provides an overview of the architecture in figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Xuehao Wang Fi/MTSAM.
Open Datasets	Yes	In this section, we empirically evaluate the proposed MTSAM on three benchmark datasets, including NYUv2 (Silberman et al., 2012), City Scapes (Cordts et al., 2016), and PASCAL-Context (Everingham et al., 2010).
Dataset Splits	No	The paper evaluates on NYUv2, City Scapes, and PASCAL-Context datasets but does not explicitly detail the training, validation, and test splits used. It does not mention specific percentages, sample counts, or explicit references to predefined splits for reproduction.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using the Adam optimizer and specific loss functions but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	The batch size is set to 4 for NYUv2 and 8 for City Scapes and PASCALContext. The cross-entropy loss, L1 loss, and cosine similarity loss are used as the loss functions of the semantic segmentation, depth estimation, and surface normal prediction tasks, respectively. The Adam optimizer is used to update fine-tuned parameters. In the Adam optimizer, an initial learning rate is set to 10^-3, the linear learning rate scheduler with warmup is adopted while the warmup rate is set to 0.05, and the weight decay is set to 10^-6. The dropout rate is set to 0.1. For the proposed To RA, we set p = q = 32, v = 8 on the NYUv2 and PASCAL-Context datasets, and p = q = 16, v = 4 on the City Scapes dataset. The hyper-parameter λ is set to 1. The total number of finetuned epochs is set to 200, 50, and 30 for the NYUv2, City Scapes, and PASCAL-Context datasets, respectively. For City Scapes and PASCAL-Context datasets, we use equal weights for each task (i.e., wi equals 1 in Eq. (8)), while for the NYUv2 dataset, we follow (Lopes et al., 2023) to set the weights of semantic segmentation, depth estimation, and surface normal prediction tasks to be 1, 1, and 4, respectively.