Logic Distillation: Learning from Code Function by Function for Decision-making Tasks
Authors: Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in continuous decision-making tasks, comparable to, or even surpassing, those of L-LLMs. The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation. |
| Researcher Affiliation | Academia | 1The School of Computer and Artificial Intelligence of Zhengzhou University 2Engineering Research Center of Intelligent Swarm Systems, Ministry of Education 3National Supercomputing Center In Zhengzhou 4Zhejiang University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Logic Distillation Input: rules (instructions) x. Parameter: L-LLMs pθL, S-LLMs pθS, retriever pθR. Output: the decision-making outcome [o1, o2, ] . 1: Generate functions f and corresponding user manual u with L-LLMs by Equation 1. 2: Building function base Df with f and u. 3: Initialize O, s. 4: while Decision-making output O of one step does not meet the task requirements do 5: while j in 1, 2, , J do 6: Retrieve top-K functions [f1, , f K] with pθR, x and s by Equation 2. 7: S-LLMs select the most suitable function fj from [f1, , f K] for stage j. 8: Obtaining intermediate results oj by Equation 4. 9: end while 10: O, s = o J 11: if emergencies then 12: Generate functions f E by Equation 6 and add f E into [f1, , f K]. 13: end if 14: end while |
| Open Source Code | Yes | The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation. |
| Open Datasets | Yes | The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation. |
| Dataset Splits | No | To perform KD, we initialize 221 sets of starting positions randomly and produce 103,355 sets of outputs with the L-LLM. Subsequently, we fine-tune the S-LLM with Lo RA [Hu et al., 2021] based on these outputs. ... All methods are tested on 200 sets of starting positions. |
| Hardware Specification | No | On the other hand, numerous companies have attempted to develop relatively smaller open-source LLMs, including GLM4-9B [GLM et al., 2024] and LLa MA-7B [Touvron et al., 2023], which are compatible with consumer-grade GPUs like RTX 3090 Ti. In this paper, we refer to LLMs that cannot be deployed on most devices and require invocation through a paid interface as larger LLMs (L-LLMs), in contrast to smaller LLMs (S-LLMs) deployable on consumergrade GPUs. |
| Software Dependencies | No | To perform KD, we initialize 221 sets of starting positions randomly and produce 103,355 sets of outputs with the L-LLM. Subsequently, we fine-tune the S-LLM with Lo RA [Hu et al., 2021] based on these outputs. |
| Experiment Setup | No | More specifically, the pursuit game involves two sides, each controlled by a different LLM. One LLM manages three blue dots, while the other one controls an orange dot. Each interaction between the two sides constitutes a step. In each iteration, the blue dots are constrained to move by two units, while the orange dot is restricted to a single unit of movement. The game concludes when the Manhattan distance between all three blue dots and the orange dot is less than 2 units. ... if LLMs make more than seven illegal choices, the game is considered a failure. The upper limit for the number of moves in the game is capped at 100. |