Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Authors: Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm on a diverse range of language and vision-based tasks that require search and exploration. Across these tasks, IGE strongly exceeds classic reinforcement learning and graph search baselines, and also succeeds where prior state-of-the-art FM agents like Reflexion completely fail. Overall, INTELLIGENT GO-EXPLORE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities. All our code is open-sourced at: https://github.com/conglu1997/ intelligent-go-explore.
Researcher Affiliation Academia Cong Lu1,2 EMAIL Shengran Hu1,2 EMAIL Jeff Clune1,2,3 EMAIL 1University of British Columbia 2Vector Institute 3Canada CIFAR AI Chair
Pseudocode Yes We illustrate our resultant algorithm at the top of Figure 1 and provide full pseudocode in Algorithm 1.
Open Source Code Yes All our code is open-sourced at: https://github.com/conglu1997/ intelligent-go-explore.
Open Datasets Yes We first demonstrate the effectiveness of IGE in a mathematical reasoning task, Game of 24 (Yao et al., 2023a). The goal is to perform basic arithmetic operations (+, , , /) starting from 4 numbers to obtain 24. ... Next, we show that IGE readily operates across multiple modalities in the Baby AI domains from Carta et al. (2023). ... Finally, we show IGE s ability to tackle tasks requiring long-horizon memory and planning, exploration, and commonsense in Text World (Cˆot e et al., 2018), a classic text-based agent benchmark.
Dataset Splits Yes We evaluate IGE across 100 hard test problems in Figure 2
Hardware Specification No We used GPT-4-Turbo for Game of 24 and GPT-4o for Baby AI and Text World. This was purely done to select the version of GPT-4 that was available and the cheapest at the time of running the experiments. The version of GPT-4 is consistent per environment.
Software Dependencies No The paper mentions using specific versions of large language models like GPT-4-Turbo and GPT-4o, but does not specify other ancillary software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or their corresponding version numbers.
Experiment Setup Yes Full hyperparameters are detailed in Appendix E. We list the hyperparameters for IGE in Table 6. We list the sampling parameters for GPT-4 (Open AI, 2024) passed via the Open AI API in Table 7.