Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding

Authors: Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang, Feng Wu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on datasets including Scan Object NN, Model Net40, and Shape Net Part demonstrate the effectiveness of our proposed paradigm. In particular, our method achieves 95.6% accuracy on Model Net40 and attains 90.09% performance on the most challenging classification split Scan Object NN(PB-T50-RS).
Researcher Affiliation Academia 1University of Science and Technology of China, Hefei, China 2Deep Space Exploration Lab, Hefei, China. Correspondence to: Yixin Zha <EMAIL>.
Pseudocode No The paper describes processes using mathematical formulas and architectural diagrams (e.g., Figure 2, Figure 3) but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not provide any explicit statement about releasing source code, nor does it include a link to a code repository.
Open Datasets Yes Extensive experiments conducted on datasets including Scan Object NN, Model Net40, and Shape Net Part demonstrate the effectiveness of our proposed paradigm. ... In this section, we first present the implementation details in Sec. 4.1. After that, in Sec. 4.2, to demonstrate the effectiveness of the proposed paradigm, we evaluate its performance using four combinations of 2D and 3D pre-trained models on four downstream tasks, including synthetic object classification, real-world object classification, part segmentation and few-shot learning.
Dataset Splits Yes Real-World Shape Classification: Scan Object NN (Uy et al., 2019)is one of the most challenging 3D datasets, which covers 15K real-world objects from 15 categories. We report classification results of three variants. ... Synthetic Shape Classification: In addition to the experiments conducted on a real-world dataset, we perform experiments on a synthetic dataset, Model Net40 (Wu et al., 2015), which consists of 12,311 clean 3D CAD models, covering 40 object categories. ... Part Segmentation: We conduct part segmentation experiments on the challenging Shape Net Part (Yi et al., 2016) dataset, which comprises 16880 models with 16 different shape categories and 50 part labels.
Hardware Specification Yes All experiments are conducted on a single Ge Force TRX 3090. ... Table 5. Training recipes for Parameter-Efficient Transfer Learning. ... GPU device GTX 3090
Software Dependencies No The paper mentions 'Adam W' as the optimizer and 'cosine' for the learning rate scheduler, but does not specify version numbers for any software frameworks or libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Performing fine-tuning on Scan Object NN (Uy et al., 2019) as an example, the overall training includes 300 epochs, with a cosine learning rate (Loshchilov & Hutter, 2016) of 5e-4, and a 10-epoch warm-up period. We adopt Adam W (Loshchilov, 2017) as the optimizer. Besides, we show the BN rank of our proposed Hybrid Attention Adapter (HAA) and the rank of α and β generator, which is the dimension of the feature passed through downward projection. We also provide relevant setup of 3D-to-2D projection and 2D pretrained models, such as the resolution of 2D depth maps and the image patch size of 2D transformers. ... Table 5. Training recipes for Parameter-Efficient Transfer Learning. Config Scan Object NN ... learning rate 2e-5 weight decay 5e-2 ... training epochs 300 warmup epochs 10 batch size 32 drop path rate 0.2 Generator rank 16 q rank of HAA 18 k rank of HAA 18 BN-v rank of HAA 64 image resolution 224 image patch size 16/14 number of points 2048 number of point patches 128 point patch size 32