reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Authors: Henry Zheng, Hao Shi, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, zhongchao shi, Gao Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Dense Grounding significantly outperforms existing methods in overall accuracy, with improvements of 5.81% and 7.56% when trained on the comprehensive full dataset and smaller mini subset, respectively, further advancing the SOTA in ego-centric 3D visual grounding. Our method also achieves 1st place and receives Innovation Award in the CVPR 2024 Autonomous Grand Challenge Multi-view 3D Visual Grounding Track, validating its effectiveness and robustness.
Researcher Affiliation	Collaboration	1Department of Automation, BNRist, Tsinghua University 2AI Lab, Lenovo Research EMAIL EMAIL {gaohuang}@tsinghua.edu.cn
Pseudocode	No	The paper describes methods in structured text and uses flowcharts (Figure 2) to illustrate the architecture. Figure 6 provides a sample input/output for the LLM prompt, but there is no explicit pseudocode or algorithm block labeled as such.
Open Source Code	No	Code
Open Datasets	Yes	The Embodied Scan dataset (Wang et al., 2024), used in our experiments, is a large-scale, multi-modal, ego-centric dataset for comprehensive 3D scene understanding.
Dataset Splits	Yes	For benchmarking, the official dataset maintains a non-public test set for the test leaderboard and divides the original training set into new subsets for training and validation. In this paper, we refer to these as the training and validation sets, while the non-public test set is called the testing set. For the mini data in the Data column of Table 1 and analysis experiments in Sec. 5.2, we use a smaller subset of the data as a proxy task in performing experiments. The subset is referred to as mini sets, available through the official release by Wang et al. (2024).
Hardware Specification	No	The paper describes the software components and training parameters but does not specify any hardware details like GPU models, CPU, or memory used for the experiments.
Software Dependencies	No	The paper mentions using Res Net50, Mink Net34, CLIP text encoder, and AdamW optimizer, but it does not provide specific version numbers for these libraries or frameworks (e.g., PyTorch version, specific library versions).
Experiment Setup	Yes	Our multi-view visual grounding model, Dense Grounding, is trained with the Adam W optimizer using a learning rate of 5e-4, weight decay of 5e-4, and a batch size of 48. The model is trained for 12 epochs, with the learning rate reduced by 0.1 at epochs 8 and 11. All other settings align with Embodied Scan.