DDPA-3DVG: Vision-Language Dual-Decoupling and Progressive Alignment for 3D Visual Grounding
Authors: Hongjie Gu, Jinlong Fan, Liang Zheng, Jing Zhang, Yuxiang Yang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three challenging benchmarks, Scan Refer, Nr3D, and Sr3D, demonstrate that our method achieves state-of-the-art performance, validating its effectiveness in 3D visual grounding. |
| Researcher Affiliation | Academia | Hongjie Gu1, Jinlong Fan1 , Liang Zheng1, Jing Zhang2, Yuxiang Yang1 1School of Electronics and Information, Hangzhou Dianzi University, China 2School of Computer Science, Wuhan University, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations and diagrams (e.g., Figure 2: Overview of the Proposed DDPA-3DVG), but it does not contain any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | The code will be released at https://github.com/HDU-VRLab/DDPA-3DVG. |
| Open Datasets | Yes | We evaluated the effectiveness of DDPA-3DVG using three widely adopted and challenging datasets: Scan Refer [Chen et al., 2020], Sr3D and Nr3D [Achlioptas et al., 2020]. Scan Refer is a 3D visual grounding dataset constructed upon 800 scenes from Scan Net [Dai et al., 2017]. |
| Dataset Splits | No | The paper mentions evaluating on the 'Scan Refer validation set' and that each scene is 'categorized as easy or hard' based on object instances. It also states 'Unique( 19%) Multiple( 81%)' for Scan Refer, referring to object characteristics. However, it does not explicitly provide the specific training/test/validation split percentages, sample counts, or a detailed splitting methodology used for their experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions employing "a pre-trained RoBERTa [Liu et al., 2019] model" and "Point Net++ [Qi et al., 2017]", as well as "existing NLP tools [Schuster et al., 2015; Wu et al., 2019]". However, it does not list specific software libraries or frameworks with their version numbers that are necessary to replicate the experiments (e.g., PyTorch 1.x, Python 3.x). |
| Experiment Setup | No | The paper discusses the progressive alignment module, prediction head, and alignment losses, as well as the convergence rate compared to another method (Figure 5, mentioning 'the number of epochs required for our method to reach a performance of 52.7% is 37 fewer than that of EDA'). However, it does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, specific optimizer settings) or other detailed system-level training configurations for their proposed method. |