PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning

Authors: Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Xiangtai Li, Yabiao Wang, Chengjie Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on different point cloud learning tasks show our proposed Point RWKV outperforms the transformerand mamba-based counterparts, while significantly saving about 42% FLOPs, demonstrating the potential option for constructing foundational 3D models.
Researcher Affiliation Collaboration 1Youtu Lab, Tencent 2Zhejiang University 2Nanyang Technological University
Pseudocode No The paper describes the architecture and mathematical formulations of the components like BQE and attention mechanisms, but it does not include a distinct section or figure explicitly labeled 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code Yes Code https://hithqd.github.io/projects/Point RWKV/
Open Datasets Yes We conduct extensive experiments on various point cloud learning tasks (e.g., classification, part segmentation, and few-shot learning) to demonstrate the effectiveness of our method. As shown in Figure 2, after self-supervised pre-training on Shape Net (Chang et al. 2015), Point RWKV achieves 93.63% (+4.66%) overall accuracy on the Scan Object NN (Uy et al. 2019) and 96.89% (+1.79%) accuracy on Model Net40 (Wu et al. 2015) for shape classification, 90.26% (+3.16%) instance m Io U on Shape Net Part (Yi et al. 2016) for part segmentation
Dataset Splits Yes We conduct part segmentation on the challenging Shape Net Part (Yi et al. 2016) dataset to predict more detailed class labels for each point within a sample. It comprises 16880 models with 16 different shape categories and 50 part labels. ... To further evaluate the performance of Point RWKV with limited fine-tuning data, we conduct experiments for few-shot classification on Model Net40 with an n-way, m-shot setting, where n is the number of classes randomly sampled from the dataset, and m denotes the number of samples randomly drawn from each class. As shown in Table 2, we experiment with n {5, 10} and m {10, 20}.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU models, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup Yes Specifically, taking the embedded point patches as input, we first propose to explore the global processing capabilities within Point RWKV blocks using modified multi-headed matrix-valued states and a dynamic attention recurrence mechanism. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius nearneighbors graph with a graph stabilizer. Furthermore, we design Point RWKV as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks. ... Taking an input point cloud P RN 3, we apply the multiscale masking to obtain the M scales point clouds. Starting with the point cloud as the initial M-th scale, the process involves iterative downsampling and grouping using Furthest Point Sampling (FPS) and k Nearest-Neighbour (k-NN). ... The baseline is to only use single scale of 1024 points as input and remove the BQE, bidirectional attention mechanism, the LGM and graph stabilizer (GS) mechanism. As shown in Table 4, we add the shift by bidirectional quadratic expansion (BQE) function and the modified bidirectional attention mechanism, enhancing OA almost equally, respectively. We observe that applying the hierarchical multi-scale point cloud learning and local graph-based merging can lift the performance by a large margin, which demonstrates the importance of refined feature learning. And the absence of graph stabilizer hurts the performance, which proves the necessity of local graph adjustments. ... In Table 5, we study the impact of the number of iterations on the accuracy.