Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

Authors: Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in continuous decision-making tasks, comparable to, or even surpassing, those of L-LLMs. The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.
Researcher Affiliation Academia 1The School of Computer and Artificial Intelligence of Zhengzhou University 2Engineering Research Center of Intelligent Swarm Systems, Ministry of Education 3National Supercomputing Center In Zhengzhou 4Zhejiang University EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Logic Distillation Input: rules (instructions) x. Parameter: L-LLMs pθL, S-LLMs pθS, retriever pθR. Output: the decision-making outcome [o1, o2, ] . 1: Generate functions f and corresponding user manual u with L-LLMs by Equation 1. 2: Building function base Df with f and u. 3: Initialize O, s. 4: while Decision-making output O of one step does not meet the task requirements do 5: while j in 1, 2, , J do 6: Retrieve top-K functions [f1, , f K] with pθR, x and s by Equation 2. 7: S-LLMs select the most suitable function fj from [f1, , f K] for stage j. 8: Obtaining intermediate results oj by Equation 4. 9: end while 10: O, s = o J 11: if emergencies then 12: Generate functions f E by Equation 6 and add f E into [f1, , f K]. 13: end if 14: end while
Open Source Code Yes The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.
Open Datasets Yes The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.
Dataset Splits No To perform KD, we initialize 221 sets of starting positions randomly and produce 103,355 sets of outputs with the L-LLM. Subsequently, we fine-tune the S-LLM with Lo RA [Hu et al., 2021] based on these outputs. ... All methods are tested on 200 sets of starting positions.
Hardware Specification No On the other hand, numerous companies have attempted to develop relatively smaller open-source LLMs, including GLM4-9B [GLM et al., 2024] and LLa MA-7B [Touvron et al., 2023], which are compatible with consumer-grade GPUs like RTX 3090 Ti. In this paper, we refer to LLMs that cannot be deployed on most devices and require invocation through a paid interface as larger LLMs (L-LLMs), in contrast to smaller LLMs (S-LLMs) deployable on consumergrade GPUs.
Software Dependencies No To perform KD, we initialize 221 sets of starting positions randomly and produce 103,355 sets of outputs with the L-LLM. Subsequently, we fine-tune the S-LLM with Lo RA [Hu et al., 2021] based on these outputs.
Experiment Setup No More specifically, the pursuit game involves two sides, each controlled by a different LLM. One LLM manages three blue dots, while the other one controls an orange dot. Each interaction between the two sides constitutes a step. In each iteration, the blue dots are constrained to move by two units, while the orange dot is restricted to a single unit of movement. The game concludes when the Manhattan distance between all three blue dots and the orange dot is less than 2 units. ... if LLMs make more than seven illegal choices, the game is considered a failure. The upper limit for the number of moves in the game is capped at 100.