Core Challenges in Embodied Vision-Language Planning
Authors: Jonathan Francis, Nariaki Kitamura, Felix Labelle, Xiaopeng Lu, Ingrid Navarro, Jean Oh
JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. |
| Researcher Affiliation | Collaboration | Jonathan Francis EMAIL School of Computer Science, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, USA Nariaki Kitamura NARIAKI EMAIL Komatsu Ltd., 2-3-6 Akasaka, Minato-ku, Tokyo, Japan Felix Labelle EMAIL Xiaopeng Lu EMAIL Ingrid Navarro EMAIL Jean Oh EMAIL School of Computer Science, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, USA |
| Pseudocode | No | The paper describes various modeling approaches and planning strategies conceptually (e.g., in Section 3.1.4 'Modeling Action-Generation and Planning'), but it does not present any structured pseudocode blocks or algorithms. |
| Open Source Code | No | This paper is a survey and does not present new methodology that would require a code release. Therefore, it does not provide concrete access to source code for its own content. |
| Open Datasets | Yes | In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks... We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. ... Section 4 presents the datasets and evaluation metrics currently used by the research community. ... Table 7: Summary of EVLP dataset statistics. |
| Dataset Splits | Yes | The EVLP community already has several datasets that would facilitate this analysis, which broadly fall into two general categories: path concatenation and path decomposition. Path concatenation, such as R4R, R6R, R8R (Jain et al., 2019; Zhu et al., 2020b) works by joining paths that start and terminate near one another to generate longer paths. Agents can be trained on these longer paths or evaluated over them (Jain et al., 2019). Rather than building longer paths from the same dataset, path decomposition breaks down the path into fine-grained instructions (Li et al., 2020b; Zhu et al., 2020b). The agent is trained over those and then evaluated over the larger dataset with longer instructions. Note that path concatenation and decomposition are not mutually exclusive (Zhu et al., 2020b). |
| Hardware Specification | No | This paper is a survey and does not present new experimental results requiring specific hardware. Therefore, it does not provide details about hardware specifications for running its own experiments. |
| Software Dependencies | No | This paper is a survey and does not present new methodology that would require specific software dependencies with version numbers for its own implementation. It discusses various software components and architectures used in the surveyed papers, but not as dependencies for this paper's content. |
| Experiment Setup | No | This paper is a survey and does not present new experimental results requiring a specific setup. Therefore, it does not provide details about hyperparameters or system-level training settings for its own experiments. |