Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Bridge 2D-3D: Uncertainty-aware Hierarchical Registration Network with Domain Alignment
Authors: Zhixin Cheng, Jiacheng Deng, Xinjun Li, Baoqun Yin, Tianzhu Zhang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablation studies on RGB-D Scene V2 and 7-Scenes benchmarks demonstrate the superiority of our method, making it a state-of-the-art approach for image-to-point cloud registration tasks. |
| Researcher Affiliation | Academia | Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China |
| Pseudocode | No | The paper describes the proposed B2-3Dnet pipeline and its modules (UHMM, AMAM) in detail using descriptive text and a diagram (Figure 2), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Extensive experiments and ablation studies on RGB-D Scene V2 (Lai, Bo, and Fox 2014) and 7-Scenes (Glocker et al. 2013) benchmarks demonstrate the superiority of our method |
| Dataset Splits | Yes | RGB-D Scenes v2 consists of 14 scenes containing furniture. For each scene, we create point cloud fragments from every 25 consecutive depth frames and sample one RGB image per 25 frames. We select image-point-cloud pairs with an overlap ratio of at least 30%. Scenes 1-8 are used for training, 9-10 for validation, and 11-14 for testing, resulting in 1,748 training pairs, 236 validation pairs, and 497 testing pairs. The 7-Scenes is a collection of tracked RGB-D camera frames. All seven indoor scenes were recorded from a handheld Kinect RGB-D camera at 640 × 480 resolution. We select image-to-point-cloud pairs from each scene with at least 50% overlap, adhering to the official sequence split for training, validation, and testing. This results in 4,048 training pairs, 1,011 validation pairs, and 2,304 testing pairs. |
| Hardware Specification | Yes | We used an NVIDIA Geforce RTX 3090 GPU for training. |
| Software Dependencies | No | The paper mentions several software components and methods like ResNet, FPN, KPFCNN, CNN, Transformers, PnP-RANSAC, Gradient Reversal Layer (GRL), but it does not provide specific version numbers for any of them (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Our Domain Classifier employs three fully connected layers with dimensions {128, 64, 2}. In the uncertainty estimation layer, the mean is obtained by averaging the image features at each layer, while the variance is normalized through a convolutional layer followed by a softplus activation. To prevent the variance from being zero, a small constant is added. Thus, our total loss function is expressed as: L = Lcoarse + Lfine + Lsig + Ld. Regarding the threshold for the coefficient λ of the gradient reversal layer (GRL), we found that the range between 0.001 and 0.1 is optimal, and through testing, we determined the best choice. In the process of selecting the total variance threshold γ, as γ changed, the values exhibited some fluctuations. |