reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Language Models Speed Up Local Search for Finding Programmatic Policies

Authors: Quazi Asif Sadmine, Hendrik Baier, Levi Lelis

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results in three problems that are challenging for programmatic representations show that LLMs can speed up local search and facilitate the synthesis of policies. We evaluated our hypothesis on three deterministic domains that are challenging for current systems that generate programmatic policies: Poachers & Rangers (PR) and Climbing Monkey (CM) (Moraes et al., 2023), and Micro RTS (Ontañón, 2013). LS-LLM was more sample efficient than the evaluated baselines, including a system that is identical to LS-LLM but does not use an LLM to initialize the search. We also show that LS-LLM s performance drops sharply once we encrypt the name of the functions and terminal symbols used in the domain-specific language, suggesting that the knowledge the LLM leverages largely comes from interpreting those names. Our results suggest that LLMs can be an economical and effective solution in guiding local search algorithms in synthesizing programmatic policies.1
Researcher Affiliation	Academia	Quazi Asif Sadmine EMAIL Amii, Department of Computing Science University of Alberta Hendrik Baier EMAIL Eindhoven University of Technology Levi H. S. Lelis EMAIL Amii, Department of Computing Science University of Alberta
Pseudocode	Yes	Local Search with LLM (LS-LLM) is described in Algorithm 1, and a schematic view of it is shown in Figure 2.
Open Source Code	Yes	1The source code used to run the experiments of this paper is available online: https://github.com/asifsadmain/LS-LLM
Open Datasets	Yes	We use the following maps from the Micro RTS repository,3 with the map size in brackets: No Where To Run (9 8), Double Game (24 24), and BWDistant Resources (32 32). The images of the maps are provided in Appendix A. 3https://github.com/Farama-Foundation/Micro RTS/ We evaluated our hypothesis on three deterministic domains that are challenging for current systems that generate programmatic policies: Poachers & Rangers (PR) and Climbing Monkey (CM) (Moraes et al., 2023), and Micro RTS (Ontañón, 2013).
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits in terms of percentages or sample counts. It describes game environments and evaluation procedures but not data partitioning for static datasets.
Hardware Specification	Yes	All experiments were run on computers with 2.6 GHz CPUs and 12 GB of RAM.
Software Dependencies	No	The paper mentions using "GPT 3.5" and "Microlanguage" but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We used B = 5 for LSLLM. In PR, we set the number of gates to 60 and leave the number of branches for CM unbounded. We use the value of k = 1, 000 in Nk. SHC is run with a time limit of 2, 000 seconds. Once SHC reaches a local optimum, if there is still time allowed to search, it restarts the search from the initial candidate the LLM has suggested.