reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Authors: Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, Zhun Wang, Zhuowen Yuan, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present the ﬁrst uniﬁed platform, MMDT (Multimodal Decoding Trust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs. Our platform assesses models from multiple perspectives, including safety, hallucination, fairness/bias, privacy, adversarial robustness, and out-of-distribution (OOD) generalization. We have designed various evaluation scenarios and red teaming algorithms under different tasks for each perspective to generate challenging data, forming a high-quality benchmark. We evaluate a range of multimodal models using MMDT, and our ﬁndings reveal a series of vulnerabilities and areas for improvement across these perspectives.
Researcher Affiliation	Collaboration	1University of Illinois at Urbana-Champaign 2University of Chicago 3University of California, Berkeley 4Harvard University 5Massachusetts Institute of Technology 6Virginia Tech 7University of Georgia 8Microsoft Corporation 9Center for AI Safety
Pseudocode	No	The paper describes various red-teaming strategies and algorithms in prose across its methodology sections (e.g., Appendix D.1.2 'Red-teaming algorithms', Appendix E.1 'Red Teaming On Text-To-Image Models'). It provides prompt templates, but does not present structured pseudocode or algorithm blocks with labeled steps in a formal algorithm format.
Open Source Code	Yes	Our platform and benchmark are available at https://mmdecodingtrust.github.io/.
Open Datasets	Yes	We construct the challenging dataset based on the statistics in the COCO-2017 Train split (Lin et al., 2014)... We randomly sampled 10k instances from the Re-LAION-2B-EN-Research-Safe dataset (LAION.ai, 2024)... We use the Selfies&IDs Images Dataset (Roman, 2023)... We curate a Pri-Street-View dataset by collecting 1816 images from Google Street View in Google Maps.
Dataset Splits	Yes	Our benchmark includes 1080 and 1170 testing inputs for T2I models and I2T models, respectively. Our benchmark includes 1,776 and 12,232 testing prompts for T2I and I2T, respectively. For T2I models, we collect 681 prompts for object recognition, 813 prompts for attribute recognition, and 1,354 prompts for spatial reasoning. For I2T models, we collect 1,064 images for object recognition, 607 images for attribute recognition, and 277 images for spatial reasoning. Our benchmark includes 800 challenging prompts for T2I models, with 200 prompts for each task, and 960 challenging QA pairs for I2T models, with 240 pairs for each task.
Hardware Specification	No	The paper mentions 'integrating optimizations such as v LLM (Kwon et al., 2023) for efﬁcient inference' in its platform design, but it does not specify the hardware (e.g., specific GPU or CPU models) used for running its own experiments or evaluations.
Software Dependencies	No	The paper mentions using several models and tools such as 'GPT-4o', 'GPT-4', 'LLa MA3 (AI@Meta, 2024)', 'Easy OCR (Jaided AI, 2024)', 'SD-v2 2 image inpainting model', 'Grounding DINO', and 'GPT-3.5-turbo'. While some have release years or specific versions, a comprehensive list of programming languages, libraries, and their explicit version numbers for full replication is not provided.
Experiment Setup	Yes	For perturbed input prompts, we apply semantic-preserving perturbations (typo) to the source prompt to perform the untargeted attack. We adopt the default hyperparameters for the attack. with a medium corruption severity level set to 3.