reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SGFormer: Semantic-Geometry Fusion Transformer for Multi-modal 3D Panoptic Segmentation

Authors: Hongqi Yu, Sixian Chan, Xiaolong Zhou, Xiaoqin Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Notably, SGFormer achieves the state-of-the-art (SOTA) results on the nu Scenes and Semantic POSS, as well as yielding competitive performance on the Semantic KITTI. Moreover, SGFormer exhibits superior robustness compared to leading methods, marking an improvement of 2% to 10%. Table 1: Comparison of 3D panoptic segmentation on nu Scenes validation set, in which PQ% is the primary metric for comparison. The firstand second-best results are highlighted in bold and underline, respectively. Table 2: Comparison on nu Scenes test set. Table 3: Comparison on Semantic KITTI validation set. Table 4: Comparison on Semantic POSS validation set. Table 5: Competitive results on different robustness setting. Table 6: Ablation study of network architecture. Table 7: Detailed ablation study for the ASCA. Table 8: Ablation study for the SGTransformer.
Researcher Affiliation	Academia	Hongqi Yu1, Sixian Chan2, Xiaolong Zhou3, Xiaoqin Zhang1 1Key Laboratory of Intelligent Informatics for Safety and Emergency of Zhejiang Province, Wenzhou University, China 2College of Computer Science and Technology, Zhejiang University of Technology, China 3College of Electrical and Information Engineering, Quzhou University, China
Pseudocode	No	The paper describes the methodology using descriptive text and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	Datasets. nu Scenes (Fong et al. 2022) is a large-scale benchmark, containing 1000 scenes. Semantic KITTI (Behley et al. 2019) is an outdoor dataset, consisting of 22 sequences. Semantic POSS (Pan et al. 2020) is a challenging benchmark, including 2988 scenes of 6 sequences. ... Table 9: Comparison (m Io U) on different baselines with using ASCA (*) on Cityscapes(val) PASCAL VOC(val) Cam Vid(test). ...Table 10: Comparison on out-of-distribution generalization. ... from the Robo3D (Kong et al. 2023) benchmark.
Dataset Splits	Yes	Results on nu Scenes. As shown in Table 1, SGFormer outperforms state-of-the-arts with higher panoptic segmentation performance on the nu Scenes val set. Specifically, our method surpasses recent LCPS (Zhang et al. 2023) by 1.1% on PQ and 0.8% on m Io U. Moreover, in Table 2, our SGFormer achieves top-performing results than Panoptic PHNet (Li et al. 2022) and further surpasses LCPS on all metrics. These results demonstrate SGFormer can better distinguish objects through semantic-geometry fusion, significantly advancing 3D panotic segmentation. Results on Semantic KITTI and Semantic POSS. ...Additionally, on Semantic POSS, which features much smaller and sparser point clouds, SGFormer surpasses existing methods across almost all metrics in Table 4.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Implementation Details. The specific details are provided in the supplementary material. In ASCA, the groups g = 8 for alignment. In SGTransformer, we set δ to 0.1 and use two fusion layers, each layer with four self-attention and one cross-attention equipped with 128 input channels. In terms of loss weights, we set λhm = 100, λo = 10 and λc = 1.