Learning Evolving Tools for Large Language Models
Authors: Guoxin Chen, Zhong Zhang, Xin Cong, Fangda Guo, Yesai Wu, Yankai Lin, Wenzheng Feng, Yasheng Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning.1 |
| Researcher Affiliation | Collaboration | 1Institute of Computing Technology, Chinese Academy of Sciences 2Tsinghua University 3Renmin University of China 4Huawei Noah s Ark Lab |
| Pseudocode | Yes | Algorithm 1 delineates our customized MCTS process. |
| Open Source Code | Yes | 1Our code is available at https://github.com/Chen-GX/Tool EVO. |
| Open Datasets | Yes | Furthermore, for research purposes, we construct a new benchmark Tool QA-D based on Tool QA (Zhuang et al., 2023b) to investigate the impact of tool variability. ... 7The Tool QA-D benchmark is provided at https://github.com/Chen-GX/Tool EVO. |
| Dataset Splits | Yes | Ultimately, our Tool QA-D comprises 7 datasets and 3 sets of API usage (Pc, Psin and Ps OOD), accompanied by a total of 6,234 and 5,884 training samples, 700 and 700 development samples, and 700 and 730 test samples for Easy and Hard difficulty respectively. |
| Hardware Specification | Yes | All experiments are conducted on Ubuntu 22.04 equipped with NVIDIA A100 GPUs. |
| Software Dependencies | Yes | Our code mainly depends on python 3.114 and Py Torch 2.3.05. |
| Experiment Setup | Yes | For MCTS, we set cpuct to 1.25, consistent with Silver et al. (2016). We limit the maximum depth of each tree to 15, and set k to 5, which indicates that we will expand 5 child nodes during the expansion phase. ... For self-improved training, we configure a batch size of 512, a learning rate of 2e-5, and specify the training epoch of 8. ... We set the maximum sequence length to 1024 and use cosine learning rate scheduler with a warm up rate of 0.03. |