reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

InnateCoder: Learning Programmatic Options with Foundation Models

Authors: Rubens O. Moraes, Quazi Asif Sadmine, Hendrik Baier, Levi H. S. Lelis

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results in Micro RTS and Karel the Robot support our hypothesis, since they show that INNATECODER is more sample efficient than versions of the system that do not use options or learn them from experience.
Researcher Affiliation	Academia	Rubens O. Moraes1 , Quazi Asif Sadmine2,3 , Hendrik Baier4,5 and Levi H. S. Lelis2,3 1Departamento de Inform atica, Universidade Federal de Vic osa 2Department of Computing Science, University of Alberta 3Alberta Machine Intelligence Institute (Amii) 4Information Systems, Eindhoven University of Technology 5Centrum Wiskunde & Informatica, Amsterdam
Pseudocode	No	The paper describes the system components and methods in text and schematic diagrams (Figure 2), but does not contain a clearly labeled pseudocode or algorithm block. Figure 1 shows a context-free grammar, which is not pseudocode.
Open Source Code	Yes	1INNATECODER is available at https://github.com/rubensolv/ Innate Coder.
Open Datasets	Yes	We evaluated INNATECODER on Micro RTS [Onta n on, 2017] and Karel the Robot [Pattis, 1994]. Micro RTS We use the following maps from the Micro RTS repository,3 with the map size in brackets: No Where To Run (9 8), bases Workers (24 24), and BWDistant Resources (32 32) and Blood Bath (64 64). We use the following Karel problems, from previous works [Trivedi et al., 2021; Liu et al., 2023]: Stair Climber, Four Corners, Top Off, Maze, Clean House, Harvester, Door Key, One Stroke, Seeder, and Snake.
Dataset Splits	No	The paper describes experiments in reinforcement learning environments (Micro RTS and Karel the Robot) and evaluates performance based on metrics like winning rate and episodic return over games played or episodes. It refers to initial state distributions (µ) and rolling out policies from initial states (s0), but does not specify traditional dataset splits (e.g., train/test/validation percentages or counts) for a static dataset.
Hardware Specification	Yes	All experiments were run on 2.6 GHz CPUs with 12 GB of RAM. The research was carried out using computational resources from the Digital Research Alliance of Canada and the UFV Cluster.
Software Dependencies	Yes	We use Open AI s API for GPT 4o, whose training cut-off date is October 2023. We also perform tests, for Micro RTS, using the LLama 3.1 model with 405 billion parameters, whose training cut-off is December 2021.
Experiment Setup	Yes	We use k = 1, 000 in the neighborhood function. In Micro RTS, SHC is run with a restarting time limit of 2, 000 seconds for each self-play iteration. We use ϵ = 0.4 in our experiments. We perform 30 independent runs (seeds) of each system, including the generation of the programs by the model. We do this until we have at least 300 and at most 700 states in S.