Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Authors: Jiaqi Huang, Zunnan Xu, Ting Liu, Yong Liu, Haonan Han, Kehong Yuan, Xiu Li

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our method greatly surpasses state-of-the-art fully fine-tuned methods in referring image segmentation, with only 0.9% to 1.8% backbone parameter updates. We employ three challenging referring image segmentation benchmarks in our experiments: Ref COCO (Kazemzadeh et al. 2014) is widely used as a benchmark for referring image segmentation.
Researcher Affiliation Academia Jiaqi Huang1 , Zunnan Xu1 , Ting Liu1 , Yong Liu1, Haonan Han1, Kehong Yuan1 , Xiu Li1 1Tsinghua Shenzhen International Graduate School, Tsinghua University University Town of Shenzhen, Nanshan District, Shenzhen, Guangdong, P.R. China EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical formulations and descriptive text, but no structured pseudocode or algorithm blocks are provided.
Open Source Code Yes Code https://github.com/jiaqihuang01/DETRIS
Open Datasets Yes We employ three challenging referring image segmentation benchmarks in our experiments: Ref COCO (Kazemzadeh et al. 2014)... Ref COCO+ (Kazemzadeh et al. 2014)... G-Ref (Yu et al. 2016)...
Dataset Splits Yes Ref COCO... The dataset is divided into four subsets, consisting of 120,624 training samples, 10,834 validation samples, 5,657 samples for test A, and 5,095 samples for test B, respectively. Ref COCO+... The dataset is divided into four subsets: 120,624 train, 10,758 validation, 5,726 test A, and 4,889 test B samples.
Hardware Specification Yes DETRIS-B is trained on 2 A100 GPUs with a batch size of 32, while DETRIS-L uses 4 A100 GPUs with a batch size of 64 and an initial learning rate of 0.0002.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify any software libraries or their version numbers.
Experiment Setup Yes The Dense Aligner (dim=128) is applied at layers [1, 3, 5, 7, 9, 11] for DETRIS-B and [2, 6, 10, 14, 18, 22] for DETRIS-L. The Text Adapter (dim=64) is applied at layers [1, 3, 5, 7, 9, 11] in both models. We train the framework end-to-end for 50 epochs using the Adam optimizer. The learning rate starts at 0.0001 and decays by 0.1 at epoch 35. DETRIS-B is trained on 2 A100 GPUs with a batch size of 32, while DETRIS-L uses 4 A100 GPUs with a batch size of 64 and an initial learning rate of 0.0002.