Tool Unlearning for Tool-Augmented LLMs

Authors: Jiali Cheng, Hadi Amiri

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple tool learning datasets and tool-augmented LLMs show that TOOLDELETE effectively unlearns both randomly selected and class-specific tools, while preserving knowledge on remaining tools and maintaining performance on general tasks.
Researcher Affiliation Academia 1University of Massachusetts Lowell, USA. Correspondence to: Jiali Cheng <jiali EMAIL>, Hadi Amiri <hadi EMAIL>.
Pseudocode No The paper describes the TOOLDELETE framework with mathematical formulations and detailed textual explanations of its properties and training details. However, it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper refers to public checkpoints of tool-augmented LLMs on Huggingface (Tang Qiao Yu/Tool Alpaca-7B, Tool Bench/Tool LLa MA-2-7b-v2, gorilla-llm/gorilla-openfunctions-v0) as starting points for unlearning. However, it does not provide any explicit statement or link for the source code of the proposed TOOLDELETE methodology itself.
Open Datasets Yes We experiment with the following datasets and their corresponding LLMs: Tool Alpaca (Tang et al., 2023) is an agent-generated tool learning dataset consisting of 495 tools and 3975 training examples. [...] Tool Bench (Qin et al., 2024) consists of more than 16k real world APIs from 49 categories [...] API-Bench (Patil et al., 2023) focus on APIs that load machine learning models.
Dataset Splits Yes Then we conduct unlearning experiments with 2 20% tools randomly selected as Tf.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions specific models like 'Vicuna-v1.3', 'LLa MA-2 7B', and 'LLa MA 7B', and references a 'Python transformers package' in an example. However, it does not list specific software dependencies with their version numbers required to replicate the experimental setup.
Experiment Setup Yes We use a learning rate of 10 5 across all experiments.