reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GenPlan: Generative Sequence Models as Adaptive Planners

Authors: Akash Karthikeyan, Yash Vardhan Pant

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method through multiple simulation environments. Notably, Gen Plan outperforms state-of-the-art methods by over 10% on adaptive planning tasks, where the agent adapts to multitask missions while leveraging demonstrations from single-goal-reaching tasks. ... We evaluate the performance of Gen Plan in Baby AI (Chevalier-Boisvert et al. 2019) and continuous manipulation tasks, focusing on the agent s adaptive and generalization capabilities. ... We conduct simulations in a modified Baby AI suite following three paradigms.
Researcher Affiliation	Academia	University of Waterloo, Canada EMAIL
Pseudocode	Yes	Algorithm 1: Gen Plan Training ... Algorithm 2: Gen Plan Sampling
Open Source Code	Yes	Code https://github.com/CL2-UWaterloo/Gen Plan
Open Datasets	Yes	We evaluate the performance of Gen Plan in Baby AI (Chevalier-Boisvert et al. 2019) and continuous manipulation tasks: (a) Push T (Florence et al. 2021) and (b) Franka Kitchen (Gupta et al. 2019).
Dataset Splits	No	The model, trained on simple goal-reaching tasks, is evaluated for zero-shot adaptation across harder environments without additional finetuning. ... Success rates in reaching goals and completing tasks are reported across 250 novel environments. The map layout, goals, obstacles, and agent positions are randomized in each run.
Hardware Specification	Yes	We implemented Gen Plan using Python 3.8 and trained it on a 12-core CPU alongside an RTX A6000 GPU.
Software Dependencies	No	The paper mentions 'Python 3.8' but does not list multiple key software components with their versions or a self-contained solver/specialized package with a specific version number.
Experiment Setup	Yes	The context length is typically set to 1 but can be extended to increase the agent s memory (see figure 2B). ... The entropy lower bound β in eq. 4b is currently a hyper-parameter that must be manually specified. ... For model and environment hyperparameters, we adopt the configurations from (Lee et al. 2024).