Geometric Feature Embedding for Effective 3D Few-Shot Class Incremental Learning

Authors: Xiangqi Li, Libo Huang, Zhulin An, Weilun Feng, Chuanguang Yang, Boyu Diao, Fei Wang, Yongjun Xu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on several publicly available 3D point cloud datasets, including Model Net, Shape Net, Scan Object NN, and CO3D, demonstrate 3D-FLEG s superiority over existing state-of-the-art 3D FSCIL methods. We validated 3D-FLEG on the Model Net, Shape Net, Scan Object NN, and CO3D point cloud datasets, demonstrating superior performance in mean accuracy, harmonic mean accuracy, and accuracy drop percentage compared to existing 3D FSCIL methods. In our experiment, we use three key metrics: Average Accuracy: We calculate overall accuracy after each incremental step, covering both base and new classes. Relative Accuracy Drop Rate ( ): We introduce to quantify performance changes during incremental learning: Harmonic Accuracy (Ah): To balance performance on old and new classes, especially given the limited data for new classes, we adopt harmonic accuracy. To evaluate the effectiveness of our proposed method and its individual modules, we conducted a series of ablation studies on the cross-dataset task from Model Net to Scan Object NN.
Researcher Affiliation Academia 1Institute of Computing Technology, Chinese Academy of Sciences, No. 158 Beiqing Road, Haidian District, Beijing, 100095, China. Correspondence to: Zhulin An <EMAIL>.
Pseudocode No The paper describes methods using mathematical formulations and descriptive text but does not include any clearly labeled pseudocode blocks or algorithm listings with structured steps.
Open Source Code Yes Code is available at https://github.com/lixiangqi707/3D-FLEG.
Open Datasets Yes Experiments conducted on several publicly available 3D point cloud datasets, including Model Net, Shape Net, Scan Object NN, and CO3D, demonstrate 3D-FLEG s superiority over existing state-of-the-art 3D FSCIL methods. We validated 3D-FLEG on the Model Net, Shape Net, Scan Object NN, and CO3D point cloud datasets, demonstrating superior performance in mean accuracy, harmonic mean accuracy, and accuracy drop percentage compared to existing 3D FSCIL methods (Wu et al., 2015; Chang et al., 2015; Uy et al., 2019; Reizenstein et al., 2021; Chowdhury et al., 2022; Tan & Xiang, 2024; Cheraghian et al., 2025; Ahmadi et al., 2024).
Dataset Splits Yes For dataset partitioning, we first conducted within-dataset experiments to establish baseline performance. Using Model Net, we allocated 20 base classes with the remaining 20 classes divided into four incremental stages. For Shape Net and CO3D, we utilized 25 base classes, distributing the incremental classes across 7 or 6 tasks respectively, comprising either 30 or 25 incremental classes. To simulate limited real-scanned data for new categories, we also designed cross-dataset experiments. For the transition from Model Net to Scan Object NN, we followed (Chowdhury et al., 2022) with four tasks. In the case of Shape Net to Scan Object NN, we structured four tasks involving 44 Shape Net base classes and 15 Scan Object NN incremental classes. Lastly, for Shape Net to CO3D, we set up eleven tasks with 44 Shape Net base classes and 50 CO3D incremental classes, representing the most challenging setup. In the implementation of the incremental stages, we randomly selected five samples per category and retained one sample from previously learned categories, simulating practical data scarcity.
Hardware Specification Yes Additionally, the entire experimental process was conducted on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions using specific models like "EVA02-E-14+ CLIP model" and "eva02-base patch14 448 model" as point cloud encoders, and the "Adam W optimizer", but does not provide specific version numbers for any software libraries, programming languages, or environments.
Experiment Setup Yes The dynamic geometric feature projection cluster contains 1024 base vectors and has an update rate of 0.1 during the incremental phase. For all samples, we select 1024 points from the 3D point cloud objects using the farthest point sampling method as the input. Specifically, considering the computational overhead and model performance, our experiment is configured the same as the (Ahmadi et al., 2024). We employed the EVA02-E-14+ CLIP model and the eva02-base patch14 448 model as our point cloud encoder. The Transformer encoder comprises 2 standard layers, each with 8-head self-attention. The optimizer utilized is the Adam W optimizer, with the weight decay set to 1 10 4. During the basic category training, we trained for 10 epochs with a learning rate of 0.0005. For the new categories, we increased the training to 50 epochs and set the learning rate to 0.001, maintaining a fixed batch size of 32.