Language Models Speed Up Local Search for Finding Programmatic Policies
Authors: Quazi Asif Sadmine, Hendrik Baier, Levi Lelis
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in three problems that are challenging for programmatic representations show that LLMs can speed up local search and facilitate the synthesis of policies. We evaluated our hypothesis on three deterministic domains that are challenging for current systems that generate programmatic policies: Poachers & Rangers (PR) and Climbing Monkey (CM) (Moraes et al., 2023), and Micro RTS (Ontañón, 2013). LS-LLM was more sample efficient than the evaluated baselines, including a system that is identical to LS-LLM but does not use an LLM to initialize the search. We also show that LS-LLM s performance drops sharply once we encrypt the name of the functions and terminal symbols used in the domain-specific language, suggesting that the knowledge the LLM leverages largely comes from interpreting those names. Our results suggest that LLMs can be an economical and effective solution in guiding local search algorithms in synthesizing programmatic policies.1 |
| Researcher Affiliation | Academia | Quazi Asif Sadmine EMAIL Amii, Department of Computing Science University of Alberta Hendrik Baier EMAIL Eindhoven University of Technology Levi H. S. Lelis EMAIL Amii, Department of Computing Science University of Alberta |
| Pseudocode | Yes | Local Search with LLM (LS-LLM) is described in Algorithm 1, and a schematic view of it is shown in Figure 2. |
| Open Source Code | Yes | 1The source code used to run the experiments of this paper is available online: https://github.com/asifsadmain/LS-LLM |
| Open Datasets | Yes | We use the following maps from the Micro RTS repository,3 with the map size in brackets: No Where To Run (9 8), Double Game (24 24), and BWDistant Resources (32 32). The images of the maps are provided in Appendix A. 3https://github.com/Farama-Foundation/Micro RTS/ We evaluated our hypothesis on three deterministic domains that are challenging for current systems that generate programmatic policies: Poachers & Rangers (PR) and Climbing Monkey (CM) (Moraes et al., 2023), and Micro RTS (Ontañón, 2013). |
| Dataset Splits | No | The paper does not provide explicit training/test/validation dataset splits in terms of percentages or sample counts. It describes game environments and evaluation procedures but not data partitioning for static datasets. |
| Hardware Specification | Yes | All experiments were run on computers with 2.6 GHz CPUs and 12 GB of RAM. |
| Software Dependencies | No | The paper mentions using "GPT 3.5" and "Microlanguage" but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We used B = 5 for LSLLM. In PR, we set the number of gates to 60 and leave the number of branches for CM unbounded. We use the value of k = 1, 000 in Nk. SHC is run with a time limit of 2, 000 seconds. Once SHC reaches a local optimum, if there is still time allowed to search, it restarts the search from the initial candidate the LLM has suggested. |