Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Authors: Jiaqi Huang, Zunnan Xu, Ting Liu, Yong Liu, Haonan Han, Kehong Yuan, Xiu Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our method greatly surpasses state-of-the-art fully fine-tuned methods in referring image segmentation, with only 0.9% to 1.8% backbone parameter updates. We employ three challenging referring image segmentation benchmarks in our experiments: Ref COCO (Kazemzadeh et al. 2014) is widely used as a benchmark for referring image segmentation. |
| Researcher Affiliation | Academia | Jiaqi Huang1 , Zunnan Xu1 , Ting Liu1 , Yong Liu1, Haonan Han1, Kehong Yuan1 , Xiu Li1 1Tsinghua Shenzhen International Graduate School, Tsinghua University University Town of Shenzhen, Nanshan District, Shenzhen, Guangdong, P.R. China EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code https://github.com/jiaqihuang01/DETRIS |
| Open Datasets | Yes | We employ three challenging referring image segmentation benchmarks in our experiments: Ref COCO (Kazemzadeh et al. 2014)... Ref COCO+ (Kazemzadeh et al. 2014)... G-Ref (Yu et al. 2016)... |
| Dataset Splits | Yes | Ref COCO... The dataset is divided into four subsets, consisting of 120,624 training samples, 10,834 validation samples, 5,657 samples for test A, and 5,095 samples for test B, respectively. Ref COCO+... The dataset is divided into four subsets: 120,624 train, 10,758 validation, 5,726 test A, and 4,889 test B samples. |
| Hardware Specification | Yes | DETRIS-B is trained on 2 A100 GPUs with a batch size of 32, while DETRIS-L uses 4 A100 GPUs with a batch size of 64 and an initial learning rate of 0.0002. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify any software libraries or their version numbers. |
| Experiment Setup | Yes | The Dense Aligner (dim=128) is applied at layers [1, 3, 5, 7, 9, 11] for DETRIS-B and [2, 6, 10, 14, 18, 22] for DETRIS-L. The Text Adapter (dim=64) is applied at layers [1, 3, 5, 7, 9, 11] in both models. We train the framework end-to-end for 50 epochs using the Adam optimizer. The learning rate starts at 0.0001 and decays by 0.1 at epoch 35. DETRIS-B is trained on 2 A100 GPUs with a batch size of 32, while DETRIS-L uses 4 A100 GPUs with a batch size of 64 and an initial learning rate of 0.0002. |