reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Authors: Xinxin Zhao, Wenzhe Cai, Likun Tang, Teng Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments on the challenging open-vocabulary object navigation benchmarks demonstrates the superiority of our proposed system.
Researcher Affiliation	Academia	Xinxin Zhao , Wenzhe Cai , Likun Tang , Teng Wang School of Automation, Southeast University EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (e.g., Figure 2 for the overall pipeline) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We evaluate the effectiveness and navigation efficiency of our proposed method using the Habitat v3.0 simulator (Puig et al., 2023) on two standard Object Nav datasets: HM3D (Ramakrishnan et al., 2021) and HSSD (Khanna et al., 2023).
Dataset Splits	Yes	The HM3D dataset offers high-fidelity reconstructions of 20 entire buildings, including 80 training scenes and 20 validation scenes. The HSSD dataset provides 40 high-quality synthetic scenes, comprising 110 training scenes and 40 validation scenes.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU or CPU models, memory, or cloud computing instance types) used for running the experiments.
Software Dependencies	No	The paper mentions the use of the Habitat v3.0 simulator and various models like GPT-4o-mini, but does not specify version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or other ancillary software components used for implementation.
Experiment Setup	Yes	Each episode has a maximum limit of 500 steps. The Move Ahead action moves the agent forward by 0.25m, while the rotational actions Turn Left and Turn Right rotate the agent by 30 degrees. The task is considered successful if the agent reaches the target object with a geodesic distance smaller than a defined threshold (e.g., 1m) and executes the Stop command within a fixed number of steps. For the data collection of the Where2Imagine module, we leveraged human demonstration trajectories from the MP3D (Chang et al., 2017) dataset within the habitat-web project with the camera height 0.88m and horizontal field of view (HFOV) of 79 . The Where2Imagine model with T=11, utilizing Res Net-18 trained from scratch and GPT-4o-mini as the VLM, was evaluated over 200 epochs on the HM3D and HSSD datasets.