DDPA-3DVG: Vision-Language Dual-Decoupling and Progressive Alignment for 3D Visual Grounding

Authors: Hongjie Gu, Jinlong Fan, Liang Zheng, Jing Zhang, Yuxiang Yang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three challenging benchmarks, Scan Refer, Nr3D, and Sr3D, demonstrate that our method achieves state-of-the-art performance, validating its effectiveness in 3D visual grounding.
Researcher Affiliation Academia Hongjie Gu1, Jinlong Fan1 , Liang Zheng1, Jing Zhang2, Yuxiang Yang1 1School of Electronics and Information, Hangzhou Dianzi University, China 2School of Computer Science, Wuhan University, China EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and diagrams (e.g., Figure 2: Overview of the Proposed DDPA-3DVG), but it does not contain any explicit pseudocode blocks or algorithm listings.
Open Source Code Yes The code will be released at https://github.com/HDU-VRLab/DDPA-3DVG.
Open Datasets Yes We evaluated the effectiveness of DDPA-3DVG using three widely adopted and challenging datasets: Scan Refer [Chen et al., 2020], Sr3D and Nr3D [Achlioptas et al., 2020]. Scan Refer is a 3D visual grounding dataset constructed upon 800 scenes from Scan Net [Dai et al., 2017].
Dataset Splits No The paper mentions evaluating on the 'Scan Refer validation set' and that each scene is 'categorized as easy or hard' based on object instances. It also states 'Unique( 19%) Multiple( 81%)' for Scan Refer, referring to object characteristics. However, it does not explicitly provide the specific training/test/validation split percentages, sample counts, or a detailed splitting methodology used for their experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions employing "a pre-trained RoBERTa [Liu et al., 2019] model" and "Point Net++ [Qi et al., 2017]", as well as "existing NLP tools [Schuster et al., 2015; Wu et al., 2019]". However, it does not list specific software libraries or frameworks with their version numbers that are necessary to replicate the experiments (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup No The paper discusses the progressive alignment module, prediction head, and alignment losses, as well as the convergence rate compared to another method (Figure 5, mentioning 'the number of epochs required for our method to reach a performance of 52.7% is 37 fewer than that of EDA'). However, it does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, specific optimizer settings) or other detailed system-level training configurations for their proposed method.