RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Authors: Kunming Su, Qiuxia Wu, Panpan Cai, Xiaogang Zhu, Xuequan Lu, Zhiyong Wang, Kun Hu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method is robust to rotations, achieving the state-of-the-art performance on various downstream tasks. 4 Experiments
Researcher Affiliation Academia Kunming Su1, Qiuxia Wu1*, Panpan Cai1, Xiaogang Zhu2, Xuequan Lu3, Zhiyong Wang4, Kun Hu4 1South China University of Technology 2The University of Adelaide 3The University of Western Australia 4The University of Sydney
Pseudocode No The paper describes the method using mathematical formulations and descriptive text, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/kunmingsu07/RI-MAE.
Open Datasets Yes We pretrained RI-MAE on Shape Net (Chang et al. 2015), which consists of more than 50,000 3D models, covering 55 categories. We evaluate the proposed method on a classification task with a real-word dataset Scan Oject NN (Uy et al. 2019). Model Net40 is a synthetic shape dataset... (Wu et al. 2014). We expanded our study to include challenging semantic segmentation on the large-scale 3D scenes dataset, S3DIS (Armeni et al. 2016).
Dataset Splits Yes We expanded our study to include challenging semantic segmentation on the large-scale 3D scenes dataset, S3DIS (Armeni et al. 2016)... Areas 1 4 and 6 are for training and Area 5 is for testing.
Hardware Specification Yes All experiments were conducted utilizing two RTX 2080Ti GPUs
Software Dependencies Yes All experiments were conducted utilizing two RTX 2080Ti GPUs, with the Py Torch framework version 1.7.
Experiment Setup Yes Following existing MPM methods, we utilized FPS and KNN to divide an input point cloud into G point patches with K = 32 points for each patch. For point cloud classification, we set G = 64, while for the segmentation tasks, G = 256. The RI-Transformer encoders in RI-MAE contain 12 transformer layers, while the predictor has only one transformer block. For each transformer block, we set the internal dimension to 384 and the number of heads to 6. For training details, we utilized an Adam W optimizer with a cosine learning rate decay, applying a decay factor 0.05, and incorporated a 10-epoch warm-up phase. The learning rate was set at 0.0005 for both pretraining and classification, while for segmentation, it was adjusted to 0.0002.