InnateCoder: Learning Programmatic Options with Foundation Models
Authors: Rubens O. Moraes, Quazi Asif Sadmine, Hendrik Baier, Levi H. S. Lelis
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in Micro RTS and Karel the Robot support our hypothesis, since they show that INNATECODER is more sample efficient than versions of the system that do not use options or learn them from experience. |
| Researcher Affiliation | Academia | Rubens O. Moraes1 , Quazi Asif Sadmine2,3 , Hendrik Baier4,5 and Levi H. S. Lelis2,3 1Departamento de Inform atica, Universidade Federal de Vic osa 2Department of Computing Science, University of Alberta 3Alberta Machine Intelligence Institute (Amii) 4Information Systems, Eindhoven University of Technology 5Centrum Wiskunde & Informatica, Amsterdam |
| Pseudocode | No | The paper describes the system components and methods in text and schematic diagrams (Figure 2), but does not contain a clearly labeled pseudocode or algorithm block. Figure 1 shows a context-free grammar, which is not pseudocode. |
| Open Source Code | Yes | 1INNATECODER is available at https://github.com/rubensolv/ Innate Coder. |
| Open Datasets | Yes | We evaluated INNATECODER on Micro RTS [Onta n on, 2017] and Karel the Robot [Pattis, 1994]. Micro RTS We use the following maps from the Micro RTS repository,3 with the map size in brackets: No Where To Run (9 8), bases Workers (24 24), and BWDistant Resources (32 32) and Blood Bath (64 64). We use the following Karel problems, from previous works [Trivedi et al., 2021; Liu et al., 2023]: Stair Climber, Four Corners, Top Off, Maze, Clean House, Harvester, Door Key, One Stroke, Seeder, and Snake. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments (Micro RTS and Karel the Robot) and evaluates performance based on metrics like winning rate and episodic return over games played or episodes. It refers to initial state distributions (µ) and rolling out policies from initial states (s0), but does not specify traditional dataset splits (e.g., train/test/validation percentages or counts) for a static dataset. |
| Hardware Specification | Yes | All experiments were run on 2.6 GHz CPUs with 12 GB of RAM. The research was carried out using computational resources from the Digital Research Alliance of Canada and the UFV Cluster. |
| Software Dependencies | Yes | We use Open AI s API for GPT 4o, whose training cut-off date is October 2023. We also perform tests, for Micro RTS, using the LLama 3.1 model with 405 billion parameters, whose training cut-off is December 2021. |
| Experiment Setup | Yes | We use k = 1, 000 in the neighborhood function. In Micro RTS, SHC is run with a restarting time limit of 2, 000 seconds for each self-play iteration. We use ϵ = 0.4 in our experiments. We perform 30 independent runs (seeds) of each system, including the generation of the programs by the model. We do this until we have at least 300 and at most 700 states in S. |