Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance
Authors: Duc-Hai Pham, Duc-Dung Nguyen, Anh Pham, Tuan Ho, Phong Nguyen, Khoi Nguyen, Rang Nguyen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed framework has two key advantages: (1) Generalizability, as it is compatible with various 3D semantic scene completion methods, including 2D-3D lifting and 3D2D transformer techniques; and (2) Effectiveness, as demonstrated by experiments on the Semantic KITTI and NYUv2 datasets, where our method achieves up to 85% of the fully supervised performance using only 10% of the labeled data. |
| Researcher Affiliation | Collaboration | Duc-Hai Pham1, Duc-Dung Nguyen2, Anh Pham2, Tuan Ho1, Phong Nguyen1, Khoi Nguyen1, Rang Nguyen1 1Vin AI Research, Vietnam 2AITech Lab., Ho Chi Minh City University of Technology, VNU-HCM, Vietnam |
| Pseudocode | No | A pseudo-code implementation is provided in the Supplementary Material. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our approach on the outdoor Semantic KITTI (Behley et al. 2019) and indoor NYUv2 (Silberman et al. 2012) datasets. |
| Dataset Splits | Yes | For Semantic KITTI, we sample 40, 198, and 383 frames (corresponding to 1%, 5%, and 10% of the training set), consistent with existing setups (Wang et al. 2023; Behley et al. 2019). For NYUv2, we uniformly sample 40 and 80 frames (representing 5% and 10% of the training set). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | We employ 2 layers of Dilated Neighborhood Attention with 4 heads, a kernel size of 7, and 4 dilation rates (1, 2, 4, 8). Additional details on losses for training each SSC network and further implementation specifics are provided in the Supplementary Material. |