Breaking the Self-Evaluation Barrier: Reinforced Neuro-Symbolic Planning with Large Language Models
Authors: Jie-Jing Shao, Hong-Jie You, Guohao Cai, Quanyu Dai, Zhenhua Dong, Lan-Zhe Guo
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach significantly improves planning accuracy and constraint satisfaction across various domains, outperforming traditional self-evaluation methods. It highlights the potential of hybrid neuro-symbolic systems to address complex constrained planning tasks. 4 Empirical Study 4.1 Experimental Setup We evaluate our proposal on diverse tasks, including Game of 24, Game of 28, Game of 30, Constrained Knapsack, and Travel Planning. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China 3Huawei Noah s Ark Lab, Shenzhen, China 4School of Intelligence Science and Technology, Nanjing University, Nanjing, China EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 The proposed RNSP |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing code, nor does it include a link to a code repository. The 'Conclusion and Discussion' section mentions future work related to a reward model but does not refer to the current work's code release. |
| Open Datasets | Yes | For the game of 24, we follow the [Yao et al., 2023a] and collect the data from 4nums.com, a website hosting mathematical games, specifically selecting 1,362 games sorted by human solving time from easy to hard. We further conduct the experiments on a real-world planning benchmark Travel Planner [Xie et al., 2024]. |
| Dataset Splits | Yes | The samples indexed 800-900 are utilized to train, and the samples indexed 901-1000 are utilized to test. For the Game of 28 and Game of 30... The scale of the training and testing data is the same as that for the Game of 24, with each consisting of 100 problems. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions using specific Large Language Models (LLMs) like Deep Seek-V3, GPT-4o, GPT-4o-mini, and GPT-3.5-turbo, but it does not specify version numbers for any ancillary software, libraries, or programming languages used in their implementation. |
| Experiment Setup | Yes | We set a beam width B to control the complexity of the search. This structured search ensures that computational resources are efficiently used to explore feasible solutions. Given a state st in the current candidates, the LLMs ϕ are employed to generate K action candidates [a1 t, a2 t, , a K t ] g(at|st, ϕ). |