reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Authors: Youjun Zhao, Jiaying Lin, Rynson W. H. Lau

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed method outperforms SOTA methods on the existing OV-3DOD benchmarks. It also achieves promising OV-3DOD results even without any 3D annotations. We conduct extensive experiments on the OV-3DOD benchmarks. Our method achieves superior performances compared to existing state-of-the-art approaches, demonstrating its effectiveness for the OV-3DOD task.
Researcher Affiliation	Academia	Youjun Zhao, Jiaying Lin, Rynson W.H. Lau Department of Computer Science, City University of Hong Kong EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods like Hierarchical Data Integration (HDI), Interactive Cross-Modal Alignment (ICMA), and Object-Focusing Context Adjustment (OFCA) using descriptive text and mathematical equations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and Extended version https://youjunzhao.github.io/HCMA/
Open Datasets	Yes	Scan Net (Dai et al. 2017) is a widely used 3D object detection dataset. ... SUN RGB-D (Song, Lichtenberg, and Xiao 2015) is another popular 3D object detection dataset.
Dataset Splits	No	The paper mentions using ScanNet and SUN RGB-D datasets and conducting evaluations. It states "Our experimental setup follows that of OV-3DET (Lu et al. 2023) for fair comparison." However, it does not explicitly provide specific percentages, sample counts, or detailed methodologies for training, validation, and test splits within the provided text.
Hardware Specification	Yes	Experiments are conducted on a single RTX4090 GPU.
Software Dependencies	No	The paper mentions specific models like "3DETR (Misra, Girdhar, and Joulin 2021) as our 3D detector backbone" and "pre-trained CLIP image and text encoders," and an optimizer "Adam W optimizer." However, it does not provide specific version numbers for software libraries, programming languages (e.g., Python), or frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	We train our model using the Adam W optimizer with a cosine learning rate scheme. The base learning rate and the weight decay are set to 10 4 and 0.1, respectively. The temperature parameter τ is set to 0.1 in contrastive learning. We adopt 3DETR (Misra, Girdhar, and Joulin 2021) as our 3D detector backbone. The number of object queries for 3DETR is set to 128. Experiments are conducted on a single RTX4090 GPU. Our training epoch is the same as the baseline method OV-3DET.