Voyager: An Open-Ended Embodied Agent with Large Language Models
Authors: Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3 more unique items, travels 2.3 longer distances, and unlocks key tech tree milestones up to 15.3 faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. The paper also includes a dedicated 'Experiments' section (Section 3) with subsections for 'Experimental Setup', 'Baselines', and 'Evaluation Results', and 'Ablation Studies', all indicating empirical validation. |
| Researcher Affiliation | Collaboration | 1NVIDIA, 2Caltech, 3UT Austin, 4Stanford, 5UW Madison. The affiliations include both NVIDIA (an industry entity) and universities (Caltech, UT Austin, Stanford, UW Madison), indicating a collaboration between industry and academia. |
| Pseudocode | Yes | The pseudocode of Voyager algorithm is shown in Pseudocode 1. |
| Open Source Code | No | The paper mentions 'Mine Dojo (Fan et al., 2022), an open-source Minecraft AI framework' and 'https://voyager.minedojo.org'. However, it does not provide an explicit statement from the authors that *their own code for Voyager* is released, nor a direct link to a code repository for their implementation. The provided URL is a project page, not a specific code repository. |
| Open Datasets | Yes | We evaluate Voyager systematically against other LLM-based agent techniques (e.g., Re Act (Yao et al., 2022), Reflexion (Shinn et al., 2023), Auto GPT (Richards, 2023)) in Mine Dojo (Fan et al., 2022), an open-source Minecraft AI framework. Mine Dojo is cited as 'Mine Dojo: Building open-ended embodied agents with internet-scale knowledge.' (Fan et al., 2022). Other datasets like 'Minerl: A large-scale dataset of minecraft demonstrations.' (Guss et al., 2019b) are also cited. |
| Dataset Splits | No | The paper mentions experimental runs like 'We run three trials for each method.' and scenarios like 'To evaluate zero-shot generalization, we clear the agent s inventory, reset it to a newly instantiated world, and test it with unseen tasks.' but does not specify any conventional training/test/validation dataset splits (e.g., percentages, sample counts, or predefined splits) for reproducing data partitioning. |
| Hardware Specification | No | The paper states, 'We leverage Open AI s gpt-4-0314 (Open AI, 2023) and gpt-3.5-turbo-0301 (chatgpt) APIs for text completion, along with text-embedding-ada-002 (embedding) API for text embedding.' This specifies the APIs used, which are cloud-based services, but does not provide details about the specific hardware (e.g., GPU models, CPU types) on which the experiments were run or the Minecraft simulation was executed. |
| Software Dependencies | Yes | We leverage Open AI s gpt-4-0314 (Open AI, 2023) and gpt-3.5-turbo-0301 (chatgpt) APIs for text completion, along with text-embedding-ada-002 (embedding) API for text embedding. Our simulation environment is built on top of Mine Dojo (Fan et al., 2022) and utilizes Mineflayer (Prismarine JS, 2013) Java Script APIs for motor controls. |
| Experiment Setup | Yes | We set all temperatures to 0 except for the automatic curriculum, which uses temperature = 0.1 to encourage task diversity. If the bot dies, it is resurrected near the closest ground, and its inventory is preserved for uninterrupted exploration. The bot recycles its crafting table and furnace after program execution. Appendix A.2.3 and Table A.1 also detail a 'Warm-up schedule' for incorporating information into prompts, specifying the number of tasks completed before certain information is used. |