reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RANa: Retrieval-Augmented Navigation

Authors: Gianluca Monaci, Rafael S. Rezende, Romain Deffayet, Gabriela Csurka, Guillaume Bono, Hervé Déjean, Stéphane Clinchant, Christian Wolf

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance. ... 5 Experimental results. We train and evaluate our agents on the Habitat simulator and platform (Savva et al., 2019) according to the standard Image Nav, Instance-Image Nav and Object Nav task definitions.
Researcher Affiliation	Industry	All authors are affiliated with Naver Labs Europe EMAIL
Pseudocode	No	The paper describes the agent architecture and retrieval mechanisms using mathematical equations and textual descriptions, for example, in Section 3 and 4, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	We base our agents on the DEBi T architecture by starting from the official codebase and weights provided by the authors2, and extend them as described in Section 3. ... 2https://github.com/naver/debit. While the paper refers to the codebase of a base architecture (DEBi T), it does not explicitly state that the authors of this paper are releasing the code for their specific contributions (RANa).
Open Datasets	Yes	We use the Gibson dataset (Xia et al., 2018), consisting of 72 train and 14 eval scenes... To train and test the Object Nav variant in Section 6, we use 80 train and 20 eval scenes of the HM3DSem-v0.2 (Ramakrishnan et al., 2021; Yadav et al., 2023c)
Dataset Splits	Yes	We use the Gibson dataset (Xia et al., 2018), consisting of 72 train and 14 eval scenes... To train and test the Object Nav variant in Section 6, we use 80 train and 20 eval scenes of the HM3DSem-v0.2 (Ramakrishnan et al., 2021; Yadav et al., 2023c)
Hardware Specification	Yes	Table 4: Inference runtime of different model components, in ms, timed on one Nvidia H100-80G GPU.
Software Dependencies	No	Appendix C mentions "Dijkstra's algorithm (Dijkstra, 1956) (from the SciPy python package)" but does not provide a version number for SciPy or any other key software libraries like PyTorch or Python.
Experiment Setup	Yes	Model training All models, retrieval-augmented or not, are trained with PPO up to 200M steps. Unless specified otherwise, the geometric FM g and encoders x and l are loaded from DEBi T and kept frozen, as is DINOv2. For retrieval-augmented agents, the context encoder c and policy network π are learned from scratch. ...The reward definition is inspired by Point Nav (Chattopadhyay et al., 2021) and Image Nav (Bono et al., 2024a) rewards and given as: rt = K 1success Geo t λ, where K=10, Geo t is the increase in geodesic distance to the goal, and slack cost λ=0.01 encourages efficiency. ... The context size is N=8 in all experiments