reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

Authors: Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, Xun Yang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three benchmark datasets clearly validate the effectiveness of Aug Refer. In Aug Refer, our initial step involves devising a cross-modal augmentation mechanism to enrich 3D scenes by injecting objects and furnishing them with diverse and precise descriptions.
Researcher Affiliation	Academia	Xinyi Wang1, Na Zhao2*, Zhiyuan Han1, Dan Guo3, Xun Yang1 1University of Science and Technology of China 2Singapore University of Technology and Design 3Hefei University of Technology EMAIL, EMAIL, EMAIL
Pseudocode	No	Algorithm 1 in the supplementary material outlines the plausible Insertion algorithm. The main paper text does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that code is provided or offer a link to a repository for the described methodology.
Open Datasets	Yes	We use three 3DVG datasets: Scan Refer (Chen et al. 2020), Nr3D (Achlioptas et al. 2020), and Sr3D (Achlioptas et al. 2020) to evaluate our method.
Dataset Splits	Yes	We use three 3DVG datasets: Scan Refer (Chen et al. 2020), Nr3D (Achlioptas et al. 2020), and Sr3D (Achlioptas et al. 2020) to evaluate our method.
Hardware Specification	Yes	Our experiments are conducted on four NVIDIA A100 80G GPUs, utilizing Py Torch and the Adam W optimizer.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We adjust the batch size to 12 or 48 and augment training with 22.5k generated pairs for each dataset. The visual encoder s learning rate is set to 2e-3 for Scan Refer, while other layers are set to 2e-4 across 150 epochs. In contrast, SR3D and NR3D have learning rates of 1e-3 and 1e-4, respectively; NR3D undergoes 200 epochs of training, whereas SR3D requires only 100 epochs due to its simpler, template-generated descriptions.