LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba

Authors: Yubo Cui, Zhiheng Li, Jiaqiang Wang, Zheng Fang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Semantic KITTI and SSCBench KITTI360 datasets show that our algorithm achieves new state-of-the-art performances in both geometric and semantic completion tasks.
Researcher Affiliation Academia Yubo Cui, Zhiheng Li, Jiaqiang Wang, Zheng Fang* Faculty of Robot Science and Engineering Northeastern University EMAIL, EMAIL
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. Figure 3 shows a block diagram of the Triplane Fusion Mamba Block, but it is not pseudocode.
Open Source Code No Our code will be open soon.
Open Datasets Yes Following previous works (Jiang et al. 2024), we evaluate the proposed LOMA on Semantic KITTI (Behley et al. 2019) and SSCBench-KITTI360 (Li et al. 2023a) datasets.
Dataset Splits Yes Semantic KITTI comprises 22 driving sequences, with an official split of 10, 1, and 11 sequences for training, validation, and testing respectively. The input RGB images are with sizes of 1226 370, and the annotation label has 20 semantic classes. The output scene covers an area of 51.2m 51.2m 64m and is voxelized into a grid with a shape of 256 256 32 using voxels of size 0.2m. SSCBench-KITTI360 includes 7 training sequences, 1 validation sequence and 1 testing sequence. Its input RGB images are with sizes of 1408 376, and the annotation label has 19 semantic classes. SSCBench-KITTI360 also has the voxel size of 256 256 32.
Hardware Specification Yes We train our LOMA for 30 epochs on 4 NVIDIA 3090 GPUs, with a batch size of 4, and employ random horizontal flip augmentations.
Software Dependencies No The paper mentions using Resnet-50, LSeg model, Mobile Stereo Net, and AdamW optimizer, but it does not specify any software dependency versions (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We utilize the Adam W optimizer with an initial learning rate of 2 10 4 and a weight decay of 10 4. We train our LOMA for 30 epochs on 4 NVIDIA 3090 GPUs, with a batch size of 4, and employ random horizontal flip augmentations.