Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Authors: Ming Dai, Jian Li, Jiedong Zhuang, Xian Zhang, Wankou Yang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on the Ref COCO, Ref COCO+, and Ref COCOg datasets demonstrate the efficacy and soundness of C3VG, which significantly outperforms state-of-the-art REC and RIS methods by a substantial margin. Experiments Experimental Setup We evaluate the proposed model in Ref COCO (Yu et al. 2016), Ref COCO+ and Ref COCOg (Nagaraja, Morariu, and Davis 2016) datasets. The maximum sentence length is set to 20. The images are resized to 320 × 320. Based on previous works (Zhu et al. 2022), m Io U and Prec@0.5(Acc(REC) in ablation study) are adopted to evaluate the performance of methods. We train our models for 30 epochs with a batch size of 16. Adam (Kingma and Ba 2014) is adopted as our optimizer. All experiments are conducted on a system with dual NVIDIA 4090 GPUs. |
| Researcher Affiliation | Collaboration | Ming Dai1, Jian Li2, Jiedong Zhuang3, Xian Zhang1, Wankou Yang1,4 1School of Automation, Southeast University, China 2Youtu Lab, Tencent, China 3College of Information Science and Electronic Engineering, Zhejiang University, China 4Advanced Ocean Institute of Southeast Univerisity, Nantong, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using prose, mathematical equations, and diagrams. There are no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Empirical evaluations on the Ref COCO, Ref COCO+, and Ref COCOg datasets demonstrate the efficacy and soundness of C3VG, which significantly outperforms state-of-the-art REC and RIS methods by a substantial margin. Experiments Experimental Setup We evaluate the proposed model in Ref COCO (Yu et al. 2016), Ref COCO+ and Ref COCOg (Nagaraja, Morariu, and Davis 2016) datasets. |
| Dataset Splits | Yes | We evaluate the proposed model in Ref COCO (Yu et al. 2016), Ref COCO+ and Ref COCOg (Nagaraja, Morariu, and Davis 2016) datasets. The maximum sentence length is set to 20. The images are resized to 320 × 320. [...] The single-task part presented in Tab. 1 showcase a comparison between our method and prior advanced REC approaches. [...] Main results on REC datasets. Bold denotes the best performance. Underline denotes the second best performance. Method Publication Backbone Data Size Ref COCO Ref COCO+ Ref COCOg Time val test A test B val test A test B val(U) test(U) (ms) |
| Hardware Specification | Yes | All experiments are conducted on a system with dual NVIDIA 4090 GPUs. |
| Software Dependencies | No | Adam (Kingma and Ba 2014) is adopted as our optimizer. No other specific software versions for libraries or frameworks are mentioned. |
| Experiment Setup | Yes | We train our models for 30 epochs with a batch size of 16. Adam (Kingma and Ba 2014) is adopted as our optimizer. The weighting factors σl1 and σgiou are set to 0.5 and 0.2, respectively, while σdice and σbce are both set to 1.0 by default. Finally, the overall consistency constraint loss is defined as Lbcc = λ1Lb2m + λ2Lm2b, with the weighting coefficients λ1 and λ2 set to 1 and 3, respectively. where λrec, λbcc, and λc are set to 0.5, 0.1, and 0.3, respectively. |