Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

NaviFormer: A Spatio-Temporal Context-Aware Transformer for Object Navigation

Authors: Wei Xie, Haobo Jiang, Yun Zhu, Jianjun Qian, Jin Xie

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Gibson, Habitat-Matterport3D (HM3D) and Matterport3D (MP3D) datasets demonstrate the superiority of our approach.
Researcher Affiliation Academia 1PCA Lab, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China 2Nanyang Technological University, Singapore 3State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 4School of Intelligence Science and Technology, Nanjing University, Suzhou, China EMAIL; EMAIL; EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations, and presents a framework diagram (Fig. 2), but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code https://github.com/Xie-Nav/NaviFormer
Open Datasets Yes Our Navi Former is evaluated on three common datasets: Gibson, Habitat-Matterport3D (HM3D) and Matterport3D (MP3D).
Dataset Splits Yes For Gibson, We select 25 train/5 val scenes based on Gibson tiny split, 1000 val episodes are used to demonstrate the model performance. For HM3D, 80 train/20 val scenes are selected, and 2000 val episodes are used. The 6 object categories are chosen as target categories on Gibson and HM3D. For MP3D, 56 train/11 val scenes are employed, and 2195 val episodes are utilized.
Hardware Specification No The paper mentions using the Habitat platform, but does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions using the 'Habitat platform' and specific models like 'Red Net' and 'Mask-RCNN', but does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup Yes For the agent interacting with Gibson, HM3D and MP3D, 3D indoor simulator Habitat platform (Savva et al. 2019) is employed to drive the three 3D sence datasets. The size of the egocentric RGB-D images received by the agent is (4, 480, 640). We follow the previous method (Yu, Kasaei, and Cao 2023c) to perform semantic segmentation using Red Net (Jiang et al. 2018) and Mask-RCNN (He et al. 2017). The width and height of the BEV map are (480, 480). The agent rotates 30 and moves 0.25m forward at each time step.