reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation

Authors: Jiaqi Chen, Bingqian Lin, Xinmin Liu, Lin Ma, Xiaodan Liang, Kwan-Yee K. Wong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the challenging R2R-CE and Rx R-CE datasets show that AO-Planner achieves state-of-the-art zero-shot performance (8.8% improvement on SPL). Our method can also serve as a data annotator to obtain pseudo-labels, distilling its waypoint prediction ability into a learning-based predictor. This new predictor does not require any waypoint data from the simulator and achieves 47% SR competing with supervised methods.
Researcher Affiliation	Collaboration	1The University of Hong Kong 2Shenzhen Campus of Sun Yat-sen University 3Meituan
Pseudocode	No	The paper describes methods and processes verbally and through figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code. It mentions "For implementation details and more results, please refer to the appendices in our ar Xiv version paper (Chen et al. 2024a)" which refers to details and results, not code availability.
Open Datasets	Yes	We conduct experiments on the challenging R2RCE (Krantz et al. 2020) and Rx R-CE (Ku et al. 2020) datasets. R2R-CE is derived from the discrete path annotations from the R2R dataset (Anderson et al. 2018) and is converted into continuous environments with the Habitat simulator (Savva et al. 2019).
Dataset Splits	Yes	evaluating AO-Planner on the entire validation unseen set of R2R-CE and a random sampling subset with 500 cases from the validation unseen set of Rx R-CE. To save API costs, we also additionally sample a subset containing 100 cases from the validation unseen set of R2R-CE for ablation study.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using GPT-4, Gemini-1.5-Pro, Grounded SAM, Grounding Dino, and Segment Anything Model. However, it does not provide specific version numbers for these or any other software libraries or programming languages used for implementation.
Experiment Setup	Yes	In our framework, we set N = 4 and collect non-overlapping views from the front, back, left, and right directions as observation, i.e., Ot = {V i t }4 i=1. For the action space, the VLN-CE task defines four parameterized low-level actions, namely FORWARD (0.25m), ROTATE LEFT/RIGHT (15 ), and STOP. In the environment, we set the FOV of the agent s camera to 90 degrees and collect observations from four directions in counterclockwise order, namely front, left, back, and right views.