HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Authors: Runquan Gui, Zhihai Wang, Jie Wang, Chi Ma, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Defu Lian, Enhong Chen, Feng Wu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the Travel Planner benchmark with Gemini-1.5-Pro, resulting in a 3.6 performance improvement over o1-preview. 5. Experiments 5.1. Setups Benchmarks To evaluate the effectiveness of our method, we select three of the most challenging planning datasets: Travel Planner (Xie et al., 2024), Plan Bench (Valmeekam et al., 2024a) and Natural Plan (Zheng et al., 2024). 5.2. Main Results As shown in Table 1, we evaluate HTP s effectiveness across these benchmarks. 5.4. Ablation Study and Additional Analysis Ablations To assess the impact of individual HTP modules on overall performance, we conduct an ablation study using GPT-4o and Gemini-1.5-Pro as the backbone models. |
| Researcher Affiliation | Collaboration | 1Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China 3Noah s Ark Lab, Huawei Technologies 4College of Intelligence and Computing, Tianjin University 5State Key Laboratory of Cognitive Intelligence & University of Science and Technology of China. |
| Pseudocode | Yes | Algorithm 1 Top-down Hyper Tree Construction Algorithm Input: rules R, query q, LLM πθ, reasoning depth K, expansion width W Convert divisible set: D Convert(R) Initialize hypertree: H q for d 1 to K do |
| Open Source Code | No | The paper does not contain an explicit statement about open-sourcing the code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | To evaluate the effectiveness of our method, we select three of the most challenging planning datasets: Travel Planner (Xie et al., 2024), Plan Bench (Valmeekam et al., 2024a) and Natural Plan (Zheng et al., 2024). |
| Dataset Splits | Yes | 1) Travel Planner is a planning benchmark focused on travel planning, aiming to find an itinerary that satisfies diverse constraints regarding flights, accommodations, and other travel arrangements. In this study, we select the validation set for evaluation, which contains 180 queries and is divided into 9 groups based on difficulty levels (easy, medium and hard) and trip durations (3, 5, and 7 days). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | Yes | For Open AI models, we use gpt-3.5-turbo-1106 and gpt-4o-2024-08-06. For Gemini-1.5-Pro, we use Google Gemini-1.5-Pro APIs to obtain results. We set the temperature to 0 for all models. |
| Experiment Setup | Yes | We set the temperature to 0 for all models. To effectively select the optimal hyperchains from H and manage their number, inspired by the tree-structured methods for limiting width, we adopt three strategies: a widthbased pruning method, which restricts the total number of branches; a probability-based pruning method, where hyperchains with low confidence probabilities generated by the LLM during branching are eliminated; and an LLM-guided evaluation method, which leverages the LLM to filter and assess candidate hyperchains. |