Exploring Semantic Masked Autoencoder for Self-supervised Point Cloud Understanding
Authors: Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on datasets such as Scan Object NN, Model Net40, and Shape Net Part demonstrate the effectiveness of our proposed modules. |
| Researcher Affiliation | Academia | Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang University of Science and Technology of China / Deep Space Exploration Lab EMAIL |
| Pseudocode | No | The paper describes its methodology in Section 3, including subsections on 'MPM-based 3D architecture', 'Component Semantic Modeling', 'Component Semantic-enhanced Masking Strategy', and 'Component Semantic-enhanced Prompt-tuning', which explain the processes and equations. However, it does not contain any clearly labeled pseudocode blocks or algorithms in a structured format. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository. Phrases like 'We release our code...' or 'The source code for our method is available at...' are absent. |
| Open Datasets | Yes | Extensive experiments conducted on datasets such as Scan Object NN, Model Net40, and Shape Net Part demonstrate the effectiveness of our proposed modules. We adopt the well-known Shape Net [Chang et al., 2015] for self-supervised point cloud pre-train, which contains 57,448 synthetic point clouds with 55 object categories. We conduct experiments on Scan Object NN [Uy et al., 2019]dataset, which consists of about 15,000 objects from 15 categories. we perform experiments on a synthetic dataset, Model Net40 [Wu et al., 2015], which consists of 12,311 clean 3D CAD models, covering 40 object categories. We conduct part segmentation experiments on the challenging Shape Net Part [Yi et al., 2016] dataset, which comprises 16880 models with 16 different shape categories and 50 part labels. |
| Dataset Splits | No | The paper mentions several datasets (ShapeNet, Scan Object NN, ModelNet40, ShapeNet Part) and discusses various experimental settings (e.g., OBJ-BG, OBJ-ONLY, PB-T50-RS for Scan Object NN; with/without voting trick for ModelNet40; few-shot settings like 5-way 10-shot). However, it does not provide explicit details about the training, validation, or test dataset splits (e.g., specific percentages or sample counts for each split) for these datasets. It assumes knowledge of standard splits without explicitly stating them or citing how they were defined. |
| Hardware Specification | No | The paper describes the model architecture, including 'Transformer blocks' and 'Mamba structures' and their dimensions, but it does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper describes the implementation using frameworks like 'Transformer' and 'Mamba', but it does not provide specific version numbers for any software dependencies, such as programming languages, libraries (e.g., PyTorch, TensorFlow), or other tools. |
| Experiment Setup | Yes | For Point MAE, we adopt a typical input resolution with 1024 points and divide inputs into n = 64 point patches. For the KNN algorithm, we set k = 32. In the backbone, the encoder has 12 Transformer blocks while the decoder has 4 Transformer blocks. Each Transformer block has 384 hidden dimensions and 6 heads. For Point Mamba, the network architecture remains consistent with Point MAE, but all the Transformers have been replaced with Mamba [Gu and Dao, 2023] structures. |