reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FloNa: Floor Plan Guided Embodied Visual Navigation

Authors: Jiaxin Li, Weiqi Huang, Zan Wang, Wei Liang, Huijun Di, Feng Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further collect 20k navigation episodes across 117 scenes in the i Gibson simulator to support the training and evaluation. Extensive experiments demonstrate the effectiveness and efficiency of our framework in unfamiliar scenes using floor plan knowledge. Extensive experiments demonstrate the effectiveness and efficiency of our method in navigating within unseen environments using a floor plan.
Researcher Affiliation	Collaboration	1Beijing Institute of Technology, Beijing, China 2Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing, China 3Beijing Racobit Electronic Information Technology Co., Ltd. EMAIL, EMAIL
Pseudocode	No	The paper describes methods using equations and textual descriptions of processes (e.g., in the 'Diffusion Model' and 'Diffusion Policy' sections) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	We recommend referring to our project website for the demonstration video of the planning results.
Open Datasets	No	For benchmarking, we collect a dataset comprising approximately 20k navigation episodes across 117 distinct scenes using the i Gibson simulator (Li et al. 2021a). The dataset includes around 3.3M images captured with a 45-degree field of view. We split the scenes into 67 for training and 50 for testing to assess the model s generalization capability to unseen environments. Each scene comprises a floor plan, a traversability map, and sufficient navigation episodes. Each episode contains an A*-generated trajectory paired with corresponding RGB observations.
Dataset Splits	Yes	We split the scenes into 67 for training and 50 for testing to assess the model s generalization capability to unseen environments. The dataset includes around 3.3M images captured with a 45-degree field of view. We split the dataset into training and testing sets, which comprise 67 scenes and 50 scenes, respectively. We train Flo Diff on the training set, which consists of 67 indoor scenes, encompassing 11, 575 episodes and approximately 26 hours of trajectory data.
Hardware Specification	Yes	We train Flo Diff using one NVIDIA RTX3090 GPU and assign a batch size of 256. Our model achieves an inference rate of approximately 1.88Hz when running on an NVIDIA Jetson AGX Orin.
Software Dependencies	No	In the implementation, Flo Diff is trained for 5 epochs using Adam W (Loshchilov, Hutter et al. 2017) optimizer with a fixed learning rate of 0.0001. The attention layers are built using the native Py Torch implementation. The diffusion policy is trained using the Square Cosine Noise Scheduler (Nichol and Dhariwal 2021) with K = 10 denoising steps.
Experiment Setup	Yes	In the implementation, Flo Diff is trained for 5 epochs using Adam W (Loshchilov, Hutter et al. 2017) optimizer with a fixed learning rate of 0.0001. We empirically set λ1 = λ3 = 0.001 and λ2 = 0.005. The attention layers are built using the native Py Torch implementation. The number of multi-head attention layers and heads are both 4. We set the dimension of the observation context vector ct to 256. The diffusion policy is trained using the Square Cosine Noise Scheduler (Nichol and Dhariwal 2021) with K = 10 denoising steps. The noise prediction network ϵθ adopts a conditional U-Net architecture following (Janner et al. 2022) with 15 convolutional layers. We set the diffusion horizon as Hp = 32 and employ the first Ha = 16 steps to execute in each iteration.