Learn to Think: Bootstrapping LLM Logic Through Graph Representation Learning
Authors: Hang Gao, Chenhao Zhang, Tie Wang, Junsuo Zhao, Fengge Wu, Changwen Zheng, Huaping Liu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that this method significantly improves reasoning performance across multiple tasks without requiring additional training or task-specific prompt design. |
| Researcher Affiliation | Academia | Hang Gao1,2 , Chenhao Zhang1,2,3 , Tie Wang4 , Junsuo Zhao1,2,3 , Fengge Wu1,2,3 , Changwen Zheng1,2,3 and Huaping Liu5 1 Institute of Software, Chinese Academy of Sciences. 2 National Key Laboratory of Space Integrated Information System. 3 University of Chinese Academy of Sciences. 4 Peking University. 5 Tsinghua University. |
| Pseudocode | No | The paper describes the method in prose and flow diagrams (Figure 3) but does not include a dedicated pseudocode block or algorithm listing. |
| Open Source Code | Yes | Code can be found in https://github.com/zch65458525/L2T. |
| Open Datasets | Yes | Tasks We evaluated our method on four distinct tasks: Sudoku, the Game of 24, Truth Quest [Mondorf and Plank, 2024], and Creative Writing. |
| Dataset Splits | No | Min and Max represent the best and worst performances achieved by a method, respectively, in terms of the number of correct solutions out of 13 total puzzle sets. The paper describes problem sets for evaluation but does not specify train/validation/test splits for model training or evaluation. |
| Hardware Specification | No | We utilized the GPT-4o API to conduct all the experiments, including those for the baselines. The paper does not specify any particular hardware used for running experiments or the GNN module. |
| Software Dependencies | No | We utilized the GPT-4o API to conduct all the experiments, including those for the baselines. No specific software versions for frameworks (e.g., PyTorch, TensorFlow) or libraries used for the GNN are mentioned. |
| Experiment Setup | Yes | For the implementation of g( ), we utilize a one-layer Graph Convolutional Network (GCN) [Kipf and Welling, 2017] followed by a two-layer Multi-Layer Perceptron (MLP). We adopt the widely used PPO framework [Schulman et al., 2017] for LLM training as the specific implementation of the Actor-Critic algorithm, optimizing and updating the Actor and Critic that we have constructed. The reward rk is set to 100 if the generated thought represents the final result. Otherwise, it is an integer between 0 and 10, determined by the LLM based on G(k) and Xeva. |