Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Authors: Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS 5.3 EVALUATION METRICS 5.4 QUANTITATIVE COMPARISONS 5.5 ABLATION STUDIES Tab. 1 and Tab. 2 show quantitative comparisons on val and test set of Intent3D. |
| Researcher Affiliation | Academia | Weitai Kang1 , Mengxue Qu2 , Jyoti Kini3 , Yunchao Wei2 , Mubarak Shah3 , Yan Yan1 1University of Illinois Chicago, 2Beijing Jiaotong University, 3University of Central Florida |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides diagrams (Fig. 4, Fig. 17) illustrating the model architecture and loss functions, but no textual pseudocode. |
| Open Source Code | Yes | Code: https://github.com/Weitai Kang/Intent3D. Project: https://weitaikang.github.io/Intent3D-webpage/. |
| Open Datasets | Yes | To tackle this challenge, we introduce the new Intent3D dataset, consisting of 44,990 intention texts associated with 209 fine-grained classes from 1,042 scenes of the Scan Net [Dai et al., 2017] dataset. |
| Dataset Splits | Yes | The dataset is split into train, val, and test sets, containing 35850, 2285, and 6855 samples, respectively, each with disjoint scenes. Our train set comes from Scan Net s train split, while the val and test sets are derived from its val split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running the experiments. It only mentions general training parameters and model configurations. |
| Software Dependencies | No | The paper mentions using 'Point Net++ [Qi et al., 2017]' and 'Ro BERTa [Liu et al., 2019]' as backbones, and 'spa Cy [Honnibal et al., 2020]' for text processing, and 'Chat GPT (GPT-4)' for data generation. However, it does not specify version numbers for general software dependencies or libraries like PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | Our Intent Net is trained from scratch for 90 epochs with a batch size of 24. The learning rate is 0.001 for Point Net++ and 0.0001 for the rest of the network, which decays by 0.1 at the 65th epoch. The Ro BERTa is frozen. The number of point tokens is 1024, and the maximum length for text tokens is set to 256. The hidden dimension used is 288. For BUTD-DETR [Jain et al., 2022], we adhere to its official configuration in Scan Refer [Chen et al., 2020]. The batch size is set to 24, and the learning rate (which is the same as ours) decreases to one-tenth at the 65th epoch. It finally takes 100 epochs to converge. For EDA [Wu et al., 2023], we also follow its official configuration, where the batch size is set to 48. The learning rate for backbones is 0.002, and for the rest, it is 0.0002. The learning rate decreases to one-tenth at the 50th and 75th epochs. It takes 104 epochs to converge. For 3D-Vis TA [Zhu et al., 2023], we use its official configuration in Scan Refer [Chen et al., 2020], where the batch size is set to 64, and the learning rate is 0.0001. The warm-up step is set to 5000. Although the official training epoch is 100, we find that the model converges at the 47th epoch. In the case of Chat-3D v2, it takes 3 epochs to fine-tune the model. |