Learning Evolving Tools for Large Language Models

Authors: Guoxin Chen, Zhong Zhang, Xin Cong, Fangda Guo, Yesai Wu, Yankai Lin, Wenzheng Feng, Yasheng Wang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning.1
Researcher Affiliation Collaboration 1Institute of Computing Technology, Chinese Academy of Sciences 2Tsinghua University 3Renmin University of China 4Huawei Noah s Ark Lab
Pseudocode Yes Algorithm 1 delineates our customized MCTS process.
Open Source Code Yes 1Our code is available at https://github.com/Chen-GX/Tool EVO.
Open Datasets Yes Furthermore, for research purposes, we construct a new benchmark Tool QA-D based on Tool QA (Zhuang et al., 2023b) to investigate the impact of tool variability. ... 7The Tool QA-D benchmark is provided at https://github.com/Chen-GX/Tool EVO.
Dataset Splits Yes Ultimately, our Tool QA-D comprises 7 datasets and 3 sets of API usage (Pc, Psin and Ps OOD), accompanied by a total of 6,234 and 5,884 training samples, 700 and 700 development samples, and 700 and 730 test samples for Easy and Hard difficulty respectively.
Hardware Specification Yes All experiments are conducted on Ubuntu 22.04 equipped with NVIDIA A100 GPUs.
Software Dependencies Yes Our code mainly depends on python 3.114 and Py Torch 2.3.05.
Experiment Setup Yes For MCTS, we set cpuct to 1.25, consistent with Silver et al. (2016). We limit the maximum depth of each tree to 15, and set k to 5, which indicates that we will expand 5 child nodes during the expansion phase. ... For self-improved training, we configure a batch size of 512, a learning rate of 2e-5, and specify the training epoch of 8. ... We set the maximum sequence length to 1024 and use cosine learning rate scheduler with a warm up rate of 0.03.