Point Cloud Semantic Segmentation with Sparse and Inhomogeneous Annotations

Authors: Zhiyi Pan, Nan Zhang, Wei Gao, Shan Liu, Ge Li

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Dataset. S3DIS (Armeni et al. 2016) contains 271 rooms across six areas with 13 semantic categories. We train our model on Areas 1, 2, 3, 4, and 6, and evaluate the segmentation performance in Area 5. Scan Net V2 (Dai et al. 2017) consists of 1,513 scanned scenes obtained from 707 different indoor environments and provides 21 semantic categories for each point. We utilize 1,201 scenes for training and 312 scenes for validation, according to the official split. Annotation setting. To compare with alternatives, the performance at label rates of 0.01% on S3DIS and 20 points per scene (label rate is about 0.014%) on Scan Net V2 are reported. For comprehensive validation, we set G to 1, 10, 20, and M to regulate the inhomogeneity of sparse labeling. Implementation. We take Rand LA-Net (Hu et al. 2020) and Point Ne Xt-L (Qian et al. 2022) as the backbones to construct AADNet. Unless otherwise noted, AADNet is trained with default settings.
Researcher Affiliation Collaboration Zhiyi Pan1, 2, Nan Zhang1, Wei Gao1*, Shan Liu3, Ge Li1 1Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University 2Peng Cheng Laboratory 3Media Lab, Tencent EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the proposed Adaptive Annotation Distribution Network (AADNet) and its components (label-aware point cloud downsampling strategy, multiplicative dynamic entropy with asynchronous training) in detail, but does not present any formal pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/panzhiyi/AADNet
Open Datasets Yes Dataset. S3DIS (Armeni et al. 2016) contains 271 rooms across six areas with 13 semantic categories. ... Scan Net V2 (Dai et al. 2017) consists of 1,513 scanned scenes obtained from 707 different indoor environments and provides 21 semantic categories for each point.
Dataset Splits Yes We train our model on Areas 1, 2, 3, 4, and 6, and evaluate the segmentation performance in Area 5. ... We utilize 1,201 scenes for training and 312 scenes for validation, according to the official split.
Hardware Specification Yes Our models are trained with one NVIDIA V100 GPU on S3DIS and eight NVIDIA TESLA T4 GPUs on Scan Net V2.
Software Dependencies No The paper mentions using Rand LA-Net and Point Ne Xt-L as backbones, but does not specify versions for any programming languages, libraries, or other software components.
Experiment Setup Yes To prevent the entropy loss from misleading the network during the early stage of training (Pan et al. 2024b), we set the start epoch of asynchronous training to 50. The weight λ = 0.01 and step interval τ = 5 in Eq. 7.