Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Authors: Vaidehi Patil, Yi-Lin Sung, Peter Hase, Jie Peng, Tianlong Chen, Mohit Bansal
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that multimodal extraction attacks (with an attack success rate of 45.5%) are more successful than either image-only (32%) or text-only attacks (39%). |
| Researcher Affiliation | Academia | Vaidehi Patil Department of Computer Science University of North Carolina at Chapel Hill Yi-Lin Sung Department of Computer Science University of North Carolina at Chapel Hill Peter Hase Department of Computer Science University of North Carolina at Chapel Hill Jie Peng School of Artificial Intelligence and Data Science University of Science and Technology of China Tianlong Chen Department of Computer Science University of North Carolina at Chapel Hill Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill |
| Pseudocode | No | The paper describes the methodologies in detail, such as the data generation pipeline and attack-defense framework, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The dataset and code are publicly available at https://github.com/Vaidehi99/Un LOK-VQA |
| Open Datasets | Yes | To address this gap, we first introduce a multimodal unlearning benchmark, Un LOK-VQA (Unlearning Outside Knowledge VQA), as well as an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs. 1The dataset and code are publicly available at https://github.com/Vaidehi99/Un LOK-VQA |
| Dataset Splits | No | The paper mentions that Un LOK-VQA consists of "500 samples that have been manually filtered and verified by human evaluators" and describes how these samples are used for "efficacy, generalization, and specificity evaluation." However, it does not provide explicit training/test/validation splits for these samples or for the models themselves in a reproducible manner. |
| Hardware Specification | No | The paper mentions using LLaVA-v1.5-7B and LLaVA-v1.5-13B models for experiments but does not specify the underlying hardware (e.g., GPU models, CPU types, or cloud infrastructure) used for training or evaluation. |
| Software Dependencies | No | The paper mentions several models and frameworks such as LLaVA-v1.5, LLaMA-2-7B, SDXL, DIPPER-11B, Grounded SAM, YOLOv9, Flan-T5-XXL, and Sentence-BERT. However, it does not provide specific version numbers for these software components or other ancillary software dependencies like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | Our experiments utilize Lo RA finetuning for information deletion in MLLMs, targeting specific weight matrices in the model s MLP layers... we tune LLa VA-v1.5-7B and LLa VA-v1.5-13B and find that editing the 7th and 9th layers, respectively... We measure Attack-Success@B with B = 20 for each of the attacks... We apply Lo RA with a rank of 1 and α of 1. |