DiffusionREC: Diffusion Model with Adaptive Condition for Referring Expression Comprehension
Authors: Jingcheng Ke, Waikeung Wong, Jia Wang, Mu Li, Lunke Fei, Jie Wen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on five datasets show that Diffusion REC not only effectively addresses the limitations of existing REC methods but also surpasses them, delivering superior performance. |
| Researcher Affiliation | Academia | Jingcheng Ke1, Waikeung Wong2*, Jia Wang3, Mu Li4, Lunke Fei1, Jie Wen4 1School of Computer Science and Technology, Guangdong University of Technology, Guang Zhou, China 2School of Fashion and Textiles, The Hong Kong Polytechnic University, Hong Kong 3College of Medical Information Engineering, Guangdong Pharmaceutical University 4School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen |
| Pseudocode | No | The paper describes methods using text and mathematical equations. There is no explicit section or figure labeled 'Pseudocode' or 'Algorithm', nor are there any structured code blocks presented in the paper. |
| Open Source Code | No | The paper does not contain any explicit statement about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Extensive experiments on five datasets show that Diffusion REC not only effectively addresses the limitations of existing REC methods but also surpasses them, delivering superior performance. Extensive evaluation results for our proposed method on five challenging REC benchmarks: Ref COCO (Kazemzadeh et al. 2014), Ref COCO+ (Kazemzadeh et al. 2014), Ref COCOg (Mao et al. 2016), Flickr30K entities (Plummer et al. 2015) and Ref Clef (Kazemzadeh et al. 2014). |
| Dataset Splits | Yes | We provide detailed information about these datasets in the supplementary materials. The detailed results can be found in Table 1. The comparison between our method and existing approaches on the Ref COCO, Ref COCO+, Ref COCOg, Flickr30K, Ref Clef, and Ref-Reasoning datasets. Table 1 shows columns like 'val', 'test A', 'test B' for Ref COCO and Ref COCO+, 'val', 'test' for Ref COCOg, and 'test' for Flickr30K and Ref Clef. |
| Hardware Specification | No | The paper does not specify any details regarding the hardware used for conducting experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | In this study, we adopt the same approach as described in (Deng et al. 2021) for embedding the image and expression. We set the number of randomly generated bounding boxes, denoted as N, to 1,000. Furthermore, during the training phase, we set the number of epochs, batch size, ΜΈ and learning rate to 50, 8, 0.5 and 10 4, respectively. The denoised strategy we use is Denoising Diffusion Probabilistic Models (DDPM) (Ho, Jain, and Abbeel 2020) with T=1,000 diffusion steps, and a square-root noise schedule. During evaluation, we found that the method performs best when the number of preserved bounding boxes is less than 3% of N (Thr = 0.03N). |