Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Authors: Vaidehi Patil, Yi-Lin Sung, Peter Hase, Jie Peng, Tianlong Chen, Mohit Bansal

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that multimodal extraction attacks (with an attack success rate of 45.5%) are more successful than either image-only (32%) or text-only attacks (39%).
Researcher Affiliation Academia Vaidehi Patil Department of Computer Science University of North Carolina at Chapel Hill Yi-Lin Sung Department of Computer Science University of North Carolina at Chapel Hill Peter Hase Department of Computer Science University of North Carolina at Chapel Hill Jie Peng School of Artificial Intelligence and Data Science University of Science and Technology of China Tianlong Chen Department of Computer Science University of North Carolina at Chapel Hill Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill
Pseudocode No The paper describes the methodologies in detail, such as the data generation pipeline and attack-defense framework, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1The dataset and code are publicly available at https://github.com/Vaidehi99/Un LOK-VQA
Open Datasets Yes To address this gap, we first introduce a multimodal unlearning benchmark, Un LOK-VQA (Unlearning Outside Knowledge VQA), as well as an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs. 1The dataset and code are publicly available at https://github.com/Vaidehi99/Un LOK-VQA
Dataset Splits No The paper mentions that Un LOK-VQA consists of "500 samples that have been manually filtered and verified by human evaluators" and describes how these samples are used for "efficacy, generalization, and specificity evaluation." However, it does not provide explicit training/test/validation splits for these samples or for the models themselves in a reproducible manner.
Hardware Specification No The paper mentions using LLaVA-v1.5-7B and LLaVA-v1.5-13B models for experiments but does not specify the underlying hardware (e.g., GPU models, CPU types, or cloud infrastructure) used for training or evaluation.
Software Dependencies No The paper mentions several models and frameworks such as LLaVA-v1.5, LLaMA-2-7B, SDXL, DIPPER-11B, Grounded SAM, YOLOv9, Flan-T5-XXL, and Sentence-BERT. However, it does not provide specific version numbers for these software components or other ancillary software dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup Yes Our experiments utilize Lo RA finetuning for information deletion in MLLMs, targeting specific weight matrices in the model s MLP layers... we tune LLa VA-v1.5-7B and LLa VA-v1.5-13B and find that editing the 7th and 9th layers, respectively... We measure Attack-Success@B with B = 20 for each of the attacks... We apply Lo RA with a rank of 1 and α of 1.