RANa: Retrieval-Augmented Navigation

Authors: Gianluca Monaci, Rafael S. Rezende, Romain Deffayet, Gabriela Csurka, Guillaume Bono, Hervé Déjean, Stéphane Clinchant, Christian Wolf

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance. ... 5 Experimental results. We train and evaluate our agents on the Habitat simulator and platform (Savva et al., 2019) according to the standard Image Nav, Instance-Image Nav and Object Nav task definitions.
Researcher Affiliation Industry All authors are affiliated with Naver Labs Europe EMAIL
Pseudocode No The paper describes the agent architecture and retrieval mechanisms using mathematical equations and textual descriptions, for example, in Section 3 and 4, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No We base our agents on the DEBi T architecture by starting from the official codebase and weights provided by the authors2, and extend them as described in Section 3. ... 2https://github.com/naver/debit. While the paper refers to the codebase of a base architecture (DEBi T), it does not explicitly state that the authors of *this* paper are releasing the code for their specific contributions (RANa).
Open Datasets Yes We use the Gibson dataset (Xia et al., 2018), consisting of 72 train and 14 eval scenes... To train and test the Object Nav variant in Section 6, we use 80 train and 20 eval scenes of the HM3DSem-v0.2 (Ramakrishnan et al., 2021; Yadav et al., 2023c)
Dataset Splits Yes We use the Gibson dataset (Xia et al., 2018), consisting of 72 train and 14 eval scenes... To train and test the Object Nav variant in Section 6, we use 80 train and 20 eval scenes of the HM3DSem-v0.2 (Ramakrishnan et al., 2021; Yadav et al., 2023c)
Hardware Specification Yes Table 4: Inference runtime of different model components, in ms, timed on one Nvidia H100-80G GPU.
Software Dependencies No Appendix C mentions "Dijkstra's algorithm (Dijkstra, 1956) (from the SciPy python package)" but does not provide a version number for SciPy or any other key software libraries like PyTorch or Python.
Experiment Setup Yes Model training All models, retrieval-augmented or not, are trained with PPO up to 200M steps. Unless specified otherwise, the geometric FM g and encoders x and l are loaded from DEBi T and kept frozen, as is DINOv2. For retrieval-augmented agents, the context encoder c and policy network π are learned from scratch. ...The reward definition is inspired by Point Nav (Chattopadhyay et al., 2021) and Image Nav (Bono et al., 2024a) rewards and given as: rt = K 1success Geo t λ, where K=10, Geo t is the increase in geodesic distance to the goal, and slack cost λ=0.01 encourages efficiency. ... The context size is N=8 in all experiments