Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts
Authors: Sukai Huang, Nir Lipovetzky, Trevor Cohn
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments test the following hypotheses: (H1) Semantic equivalence across different representations, as discussed by Weaver, holds true in our context. (H2) Ambiguity in natural language descriptions leads to multiple interpretations. (H3) Our pipeline produces multiple solvable candidate sets of action schemas and plans without expert intervention, providing users with a range of options. (H4) Our pipeline outperforms direct LLM planning approaches in plan quality, demonstrating the advantage of integrating LLM with symbolic planning method. See Appendix for other experiments outside the scope of these hypotheses. |
| Researcher Affiliation | Collaboration | Sukai Huang1, Nir Lipovetzky1 and Trevor Cohn1,2* 1The University of Melbourne 2 Google EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology in prose and through diagrams (Figure 3) but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Sino-Huang/Official-LLMSymbolic-Planning-without-Experts |
| Open Datasets | Yes | For training and calibration of the sentence encoder, we used domains from IPC and PDDLGym (Silver and Chitnis 2020). |
| Dataset Splits | No | The paper mentions 'test domains' and 'training and calibration of the sentence encoder' but does not provide specific percentages, sample counts, or detailed methodology for dataset splits within those domains. |
| Hardware Specification | No | The paper mentions support from 'The University of Melbourne s Research Computing Services and the Petascale Campus Initiative' but does not specify exact GPU/CPU models, processor types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions specific LLM models (GLM), sentence encoders (text-embedding-3-large, sentence-t5-xl, all-roberta-large-v1), and a symbolic planner (DUAL-BWFS) but does not provide version numbers for any software dependencies. |
| Experiment Setup | Yes | To ensure we explore a wide range of interpretations and effectively cover the user s intent, we utilize multiple LLM instances, denoted as {P 1 LLM, P 2 LLM, ..., P N LLM}, and set their temperature hyperparameter high to encourage diverse outputs. |