reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs

Authors: Jinpeng Li, Haiping Wang, Jiabin chen, Yuan Liu, Zhiyang Dou, Yuexin Ma, Sibei Yang, Yuan Li, Wenping Wang, Zhen Dong, Bisheng Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on the City Refer dataset and a new synthetic dataset annotated by us, both of which demonstrate our method can produce accurate 3D visual grounding on a city-scale 3D point cloud. The source code is available at https://github.com/WHU-USI3DV/City Anchor.
Researcher Affiliation	Academia	1LISMARS, Wuhan University 2Hong Kong University of Science and Technology 3University of Pennsylvania 4Shanghai Tech University 5Sun Yat-Sen University 6Texas A&M University
Pseudocode	No	The paper describes its methodology in natural language text and illustrates it with architectural diagrams (e.g., Figure 2 and Figure 3), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/WHU-USI3DV/City Anchor.
Open Datasets	Yes	To evaluate the performance of City Anchor, we conduct experiments on the City Refer dataset and a new synthetic self-annotated dataset. City Refer (Miyanishi et al., 2023) is a 3D visual grounding dataset annotated from city-scale dataset Sensat Urban (Hu et al., 2021b) dataset. City Anchor is a city-scale 3D visual grounding dataset. We use 25 city-scale point clouds of STPLS3D (Chen et al., 2022) dataset and manually annotate them with text prompts. We will release our City Anchor dataset under the MIT license.
Dataset Splits	Yes	City Refer (Miyanishi et al., 2023)...We use 85% of them for training and 15% of them for evaluation. City Anchor is a city-scale 3D visual grounding dataset...There are 1448 text-object pairs. 80% of these pairs are used in training while the rest are used in tests.
Hardware Specification	Yes	All the experiments are implemented with Py Torch on a single NVIDIA A100 GPU (40 GB).
Software Dependencies	No	The paper mentions "Py Torch", "LISA (Lai et al., 2023)", "Vicuna-7b-v1.3 (Zheng et al., 2024)", "LLa VA architecture", and "Lo RA layers (Hu et al., 2021a)". While these are software components or models, specific version numbers for general software dependencies like PyTorch, Python, or CUDA are not provided.
Experiment Setup	Yes	We use the Adam W optimizer with a batch size of 8 and a learning rate decaying from 2e-5 to 2e-7 with a cosine annealing scheduler. The training of CLM and FMM takes about 12 and 15 hours to converge. We set the threshold θ for the candidate object detection in CLM to a fixed 0.3 (except for the specialized analysis of Ro I threshold) and the number of neighboring objects K for spatial context-aware feature enhancement in FMM to 5. We select positive and negative samples in a ratio of 1:3 for FMM training.