reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Authors: Vaidehi Patil, Yi-Lin Sung, Peter Hase, Jie Peng, Tianlong Chen, Mohit Bansal

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that multimodal extraction attacks (with an attack success rate of 45.5%) are more successful than either image-only (32%) or text-only attacks (39%).
Researcher Affiliation	Academia	Vaidehi Patil Department of Computer Science University of North Carolina at Chapel Hill Yi-Lin Sung Department of Computer Science University of North Carolina at Chapel Hill Peter Hase Department of Computer Science University of North Carolina at Chapel Hill Jie Peng School of Artificial Intelligence and Data Science University of Science and Technology of China Tianlong Chen Department of Computer Science University of North Carolina at Chapel Hill Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill
Pseudocode	No	The paper describes the methodologies in detail, such as the data generation pipeline and attack-defense framework, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1The dataset and code are publicly available at https://github.com/Vaidehi99/Un LOK-VQA
Open Datasets	Yes	To address this gap, we first introduce a multimodal unlearning benchmark, Un LOK-VQA (Unlearning Outside Knowledge VQA), as well as an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs. 1The dataset and code are publicly available at https://github.com/Vaidehi99/Un LOK-VQA
Dataset Splits	No	The paper mentions that Un LOK-VQA consists of "500 samples that have been manually filtered and verified by human evaluators" and describes how these samples are used for "efficacy, generalization, and specificity evaluation." However, it does not provide explicit training/test/validation splits for these samples or for the models themselves in a reproducible manner.
Hardware Specification	No	The paper mentions using LLaVA-v1.5-7B and LLaVA-v1.5-13B models for experiments but does not specify the underlying hardware (e.g., GPU models, CPU types, or cloud infrastructure) used for training or evaluation.
Software Dependencies	No	The paper mentions several models and frameworks such as LLaVA-v1.5, LLaMA-2-7B, SDXL, DIPPER-11B, Grounded SAM, YOLOv9, Flan-T5-XXL, and Sentence-BERT. However, it does not provide specific version numbers for these software components or other ancillary software dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	Our experiments utilize Lo RA finetuning for information deletion in MLLMs, targeting specific weight matrices in the model s MLP layers... we tune LLa VA-v1.5-7B and LLa VA-v1.5-13B and find that editing the 7th and 9th layers, respectively... We measure Attack-Success@B with B = 20 for each of the attacks... We apply Lo RA with a rank of 1 and α of 1.