Language Models Speed Up Local Search for Finding Programmatic Policies

Authors: Quazi Asif Sadmine, Hendrik Baier, Levi Lelis

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results in three problems that are challenging for programmatic representations show that LLMs can speed up local search and facilitate the synthesis of policies. We evaluated our hypothesis on three deterministic domains that are challenging for current systems that generate programmatic policies: Poachers & Rangers (PR) and Climbing Monkey (CM) (Moraes et al., 2023), and Micro RTS (Ontañón, 2013). LS-LLM was more sample efficient than the evaluated baselines, including a system that is identical to LS-LLM but does not use an LLM to initialize the search. We also show that LS-LLM s performance drops sharply once we encrypt the name of the functions and terminal symbols used in the domain-specific language, suggesting that the knowledge the LLM leverages largely comes from interpreting those names. Our results suggest that LLMs can be an economical and effective solution in guiding local search algorithms in synthesizing programmatic policies.1
Researcher Affiliation Academia Quazi Asif Sadmine EMAIL Amii, Department of Computing Science University of Alberta Hendrik Baier EMAIL Eindhoven University of Technology Levi H. S. Lelis EMAIL Amii, Department of Computing Science University of Alberta
Pseudocode Yes Local Search with LLM (LS-LLM) is described in Algorithm 1, and a schematic view of it is shown in Figure 2.
Open Source Code Yes 1The source code used to run the experiments of this paper is available online: https://github.com/asifsadmain/LS-LLM
Open Datasets Yes We use the following maps from the Micro RTS repository,3 with the map size in brackets: No Where To Run (9 8), Double Game (24 24), and BWDistant Resources (32 32). The images of the maps are provided in Appendix A. 3https://github.com/Farama-Foundation/Micro RTS/ We evaluated our hypothesis on three deterministic domains that are challenging for current systems that generate programmatic policies: Poachers & Rangers (PR) and Climbing Monkey (CM) (Moraes et al., 2023), and Micro RTS (Ontañón, 2013).
Dataset Splits No The paper does not provide explicit training/test/validation dataset splits in terms of percentages or sample counts. It describes game environments and evaluation procedures but not data partitioning for static datasets.
Hardware Specification Yes All experiments were run on computers with 2.6 GHz CPUs and 12 GB of RAM.
Software Dependencies No The paper mentions using "GPT 3.5" and "Microlanguage" but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We used B = 5 for LSLLM. In PR, we set the number of gates to 60 and leave the number of branches for CM unbounded. We use the value of k = 1, 000 in Nk. SHC is run with a time limit of 2, 000 seconds. Once SHC reaches a local optimum, if there is still time allowed to search, it restarts the search from the initial candidate the LLM has suggested.