reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Logic Distillation: Learning from Code Function by Function for Decision-making Tasks

Authors: Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that with the assistance of LD, S-LLMs can achieve outstanding results in continuous decision-making tasks, comparable to, or even surpassing, those of L-LLMs. The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.
Researcher Affiliation	Academia	1The School of Computer and Artiﬁcial Intelligence of Zhengzhou University 2Engineering Research Center of Intelligent Swarm Systems, Ministry of Education 3National Supercomputing Center In Zhengzhou 4Zhejiang University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Logic Distillation Input: rules (instructions) x. Parameter: L-LLMs pθL, S-LLMs pθS, retriever pθR. Output: the decision-making outcome [o1, o2, ] . 1: Generate functions f and corresponding user manual u with L-LLMs by Equation 1. 2: Building function base Df with f and u. 3: Initialize O, s. 4: while Decision-making output O of one step does not meet the task requirements do 5: while j in 1, 2, , J do 6: Retrieve top-K functions [f1, , f K] with pθR, x and s by Equation 2. 7: S-LLMs select the most suitable function fj from [f1, , f K] for stage j. 8: Obtaining intermediate results oj by Equation 4. 9: end while 10: O, s = o J 11: if emergencies then 12: Generate functions f E by Equation 6 and add f E into [f1, , f K]. 13: end if 14: end while
Open Source Code	Yes	The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.
Open Datasets	Yes	The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Logic-Distillation.
Dataset Splits	No	To perform KD, we initialize 221 sets of starting positions randomly and produce 103,355 sets of outputs with the L-LLM. Subsequently, we ﬁne-tune the S-LLM with Lo RA [Hu et al., 2021] based on these outputs. ... All methods are tested on 200 sets of starting positions.
Hardware Specification	No	On the other hand, numerous companies have attempted to develop relatively smaller open-source LLMs, including GLM4-9B [GLM et al., 2024] and LLa MA-7B [Touvron et al., 2023], which are compatible with consumer-grade GPUs like RTX 3090 Ti. In this paper, we refer to LLMs that cannot be deployed on most devices and require invocation through a paid interface as larger LLMs (L-LLMs), in contrast to smaller LLMs (S-LLMs) deployable on consumergrade GPUs.
Software Dependencies	No	To perform KD, we initialize 221 sets of starting positions randomly and produce 103,355 sets of outputs with the L-LLM. Subsequently, we ﬁne-tune the S-LLM with Lo RA [Hu et al., 2021] based on these outputs.
Experiment Setup	No	More speciﬁcally, the pursuit game involves two sides, each controlled by a different LLM. One LLM manages three blue dots, while the other one controls an orange dot. Each interaction between the two sides constitutes a step. In each iteration, the blue dots are constrained to move by two units, while the orange dot is restricted to a single unit of movement. The game concludes when the Manhattan distance between all three blue dots and the orange dot is less than 2 units. ... if LLMs make more than seven illegal choices, the game is considered a failure. The upper limit for the number of moves in the game is capped at 100.