Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

Authors: Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Min Wu, Ming-Ming Cheng, Ender Konukoglu, Serge Belongie

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on S3DIS and Scan Net datasets demonstrate significant performance improvements achieved by our method.
Researcher Affiliation Academia Zhaochong An1, Guolei Sun2 , Yun Liu3 , Runjia Li4, Min Wu5, Ming-Ming Cheng3, Ender Konukoglu2, and Serge Belongie1 1 Department of Computer Science, University of Copenhagen 2 Computer Vision Laboratory, ETH Zurich 3 College of Computer Science, Nankai University 4 Department of Engineering Science, University of Oxford 5 Institute for Infocomm Research, A*STAR
Pseudocode No The paper describes the methodology using text and mathematical equations, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The code is available at this link.
Open Datasets Yes Experimental results on S3DIS and Scan Net datasets demonstrate significant performance improvements achieved by our method.
Dataset Splits Yes Following Zhao et al. (2021), we divide the large-scale scenes into 1m 1m blocks. We adhere to the standard data processing protocol from An et al. (2024), voxelizing raw input points within each block using a 0.02m grid size and uniformly sampling to maintain a maximum of 20,480 points per block. The evaluation sets consist of 1,000 episodes per class in the 1-way setting and 100 episodes per class combination in the 2-way setting.
Hardware Specification Yes Training and inference are conducted on four RTX 3090 GPUs.
Software Dependencies No The paper mentions models like LSeg and Open Seg but does not provide specific version numbers for key ancillary software components like programming languages, libraries, or frameworks (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes The initial pretraining phase spans 100 epochs, while the subsequent meta-learning phase includes 40,000 episodes, following An et al. (2024). For optimization, we use the Adam W optimizer, setting a weight decay of 0.01 and a learning rate of 0.006 during pretraining. The learning rate is reduced to 0.0001 during the meta-learning phase.