Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts

Authors: Sukai Huang, Nir Lipovetzky, Trevor Cohn

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments test the following hypotheses: (H1) Semantic equivalence across different representations, as discussed by Weaver, holds true in our context. (H2) Ambiguity in natural language descriptions leads to multiple interpretations. (H3) Our pipeline produces multiple solvable candidate sets of action schemas and plans without expert intervention, providing users with a range of options. (H4) Our pipeline outperforms direct LLM planning approaches in plan quality, demonstrating the advantage of integrating LLM with symbolic planning method. See Appendix for other experiments outside the scope of these hypotheses.
Researcher Affiliation Collaboration Sukai Huang1, Nir Lipovetzky1 and Trevor Cohn1,2* 1The University of Melbourne 2 Google EMAIL, EMAIL
Pseudocode No The paper describes its methodology in prose and through diagrams (Figure 3) but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Sino-Huang/Official-LLMSymbolic-Planning-without-Experts
Open Datasets Yes For training and calibration of the sentence encoder, we used domains from IPC and PDDLGym (Silver and Chitnis 2020).
Dataset Splits No The paper mentions 'test domains' and 'training and calibration of the sentence encoder' but does not provide specific percentages, sample counts, or detailed methodology for dataset splits within those domains.
Hardware Specification No The paper mentions support from 'The University of Melbourne s Research Computing Services and the Petascale Campus Initiative' but does not specify exact GPU/CPU models, processor types, or memory amounts used for experiments.
Software Dependencies No The paper mentions specific LLM models (GLM), sentence encoders (text-embedding-3-large, sentence-t5-xl, all-roberta-large-v1), and a symbolic planner (DUAL-BWFS) but does not provide version numbers for any software dependencies.
Experiment Setup Yes To ensure we explore a wide range of interpretations and effectively cover the user s intent, we utilize multiple LLM instances, denoted as {P 1 LLM, P 2 LLM, ..., P N LLM}, and set their temperature hyperparameter high to encourage diverse outputs.