MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Authors: Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, Zhun Wang, Zhuowen Yuan, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present the first unified platform, MMDT (Multimodal Decoding Trust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs. Our platform assesses models from multiple perspectives, including safety, hallucination, fairness/bias, privacy, adversarial robustness, and out-of-distribution (OOD) generalization. We have designed various evaluation scenarios and red teaming algorithms under different tasks for each perspective to generate challenging data, forming a high-quality benchmark. We evaluate a range of multimodal models using MMDT, and our findings reveal a series of vulnerabilities and areas for improvement across these perspectives. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign 2University of Chicago 3University of California, Berkeley 4Harvard University 5Massachusetts Institute of Technology 6Virginia Tech 7University of Georgia 8Microsoft Corporation 9Center for AI Safety |
| Pseudocode | No | The paper describes various red-teaming strategies and algorithms in prose across its methodology sections (e.g., Appendix D.1.2 'Red-teaming algorithms', Appendix E.1 'Red Teaming On Text-To-Image Models'). It provides prompt templates, but does not present structured pseudocode or algorithm blocks with labeled steps in a formal algorithm format. |
| Open Source Code | Yes | Our platform and benchmark are available at https://mmdecodingtrust.github.io/. |
| Open Datasets | Yes | We construct the challenging dataset based on the statistics in the COCO-2017 Train split (Lin et al., 2014)... We randomly sampled 10k instances from the Re-LAION-2B-EN-Research-Safe dataset (LAION.ai, 2024)... We use the Selfies&IDs Images Dataset (Roman, 2023)... We curate a Pri-Street-View dataset by collecting 1816 images from Google Street View in Google Maps. |
| Dataset Splits | Yes | Our benchmark includes 1080 and 1170 testing inputs for T2I models and I2T models, respectively. Our benchmark includes 1,776 and 12,232 testing prompts for T2I and I2T, respectively. For T2I models, we collect 681 prompts for object recognition, 813 prompts for attribute recognition, and 1,354 prompts for spatial reasoning. For I2T models, we collect 1,064 images for object recognition, 607 images for attribute recognition, and 277 images for spatial reasoning. Our benchmark includes 800 challenging prompts for T2I models, with 200 prompts for each task, and 960 challenging QA pairs for I2T models, with 240 pairs for each task. |
| Hardware Specification | No | The paper mentions 'integrating optimizations such as v LLM (Kwon et al., 2023) for efficient inference' in its platform design, but it does not specify the hardware (e.g., specific GPU or CPU models) used for running its own experiments or evaluations. |
| Software Dependencies | No | The paper mentions using several models and tools such as 'GPT-4o', 'GPT-4', 'LLa MA3 (AI@Meta, 2024)', 'Easy OCR (Jaided AI, 2024)', 'SD-v2 2 image inpainting model', 'Grounding DINO', and 'GPT-3.5-turbo'. While some have release years or specific versions, a comprehensive list of programming languages, libraries, and their explicit version numbers for full replication is not provided. |
| Experiment Setup | Yes | For perturbed input prompts, we apply semantic-preserving perturbations (typo) to the source prompt to perform the untargeted attack. We adopt the default hyperparameters for the attack. with a medium corruption severity level set to 3. |