Rational Decision-Making Agent with Learning Internal Utility Judgment

Authors: Yining Ye, Xin Cong, Shizuo Tian, Yujia Qin, Chong Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Game of 24, Web Shop, Tool Bench and Rest Bench datasets demonstrate Ra DAgent s superiority over baselines, achieving about 7.8% improvement on average. Besides, Ra DAgent can also reduce costs (Chat GPT API calls), highlighting its effectiveness and efficiency.
Researcher Affiliation Academia 1Tsinghua University 2Renmin University of China EMAIL EMAIL EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Ra DAgent
Open Source Code Yes Our source code is released in https://github.com/Open BMB/Ra D-Agent.
Open Datasets Yes We conduct extensive experiments on Game of 24 (Yao et al., 2023), Web Shop (Yao et al., 2022a), and Tool Bench (Qin et al., 2023c) datasets. ... To verify that our method is robust and applicable to real-world environments, we expand our evaluation to the Rest Bench dataset (Song et al., 2023)
Dataset Splits Yes We use 100, 500, and 500 instances for Game of 24, Web Shop, and Tool Bench to evaluate the decision-making ability respectively.
Hardware Specification No The paper mentions using OpenAI Chat GPT models (gpt-3.5-turbo-0613-16k, GPT-4o-mini, GPT-4) for implementation and experiments, which are accessed via API calls. It does not provide details of the local hardware used to run these experiments or interact with the APIs.
Software Dependencies Yes We use Open AI Chat GPT gpt-3.5-turbo-0613-16k to implement our approach (our designed prompt can refer to Appendix A). ... We implement our method and baseline based on GPT-4o-mini to reduce the API cost ... To validate the effectiveness of different LLMs, we have conducted additional experiments integrating GPT-4 into our Ra DAgent
Experiment Setup Yes Our approach involves conducting a decision-exploration process 20 times and finally selecting the decision sequence with the highest Elo score as the final decision. For Elo-based Utility Learning, the initial Elo score of the decision step is set as 0.0 and the Elo coefficient r is set as 173.72 according to the vanilla Elo rating system (Elo, 1967). The Elo score of ˆd in Equation 5 is set as 0.0. K in Equation 3 is set as 50. To manage the computational cost of Chat GPT API calls, we set a maximum limit of 12 steps for each decision sequence for a decision-searching process.