Tool Unlearning for Tool-Augmented LLMs
Authors: Jiali Cheng, Hadi Amiri
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple tool learning datasets and tool-augmented LLMs show that TOOLDELETE effectively unlearns both randomly selected and class-specific tools, while preserving knowledge on remaining tools and maintaining performance on general tasks. |
| Researcher Affiliation | Academia | 1University of Massachusetts Lowell, USA. Correspondence to: Jiali Cheng <jiali EMAIL>, Hadi Amiri <hadi EMAIL>. |
| Pseudocode | No | The paper describes the TOOLDELETE framework with mathematical formulations and detailed textual explanations of its properties and training details. However, it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper refers to public checkpoints of tool-augmented LLMs on Huggingface (Tang Qiao Yu/Tool Alpaca-7B, Tool Bench/Tool LLa MA-2-7b-v2, gorilla-llm/gorilla-openfunctions-v0) as starting points for unlearning. However, it does not provide any explicit statement or link for the source code of the proposed TOOLDELETE methodology itself. |
| Open Datasets | Yes | We experiment with the following datasets and their corresponding LLMs: Tool Alpaca (Tang et al., 2023) is an agent-generated tool learning dataset consisting of 495 tools and 3975 training examples. [...] Tool Bench (Qin et al., 2024) consists of more than 16k real world APIs from 49 categories [...] API-Bench (Patil et al., 2023) focus on APIs that load machine learning models. |
| Dataset Splits | Yes | Then we conduct unlearning experiments with 2 20% tools randomly selected as Tf. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions specific models like 'Vicuna-v1.3', 'LLa MA-2 7B', and 'LLa MA 7B', and references a 'Python transformers package' in an example. However, it does not list specific software dependencies with their version numbers required to replicate the experimental setup. |
| Experiment Setup | Yes | We use a learning rate of 10 5 across all experiments. |