MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Authors: Xuannan Liu, Zekun Li, Pei Li, Huaibo Huang, Shuhan Xia, Xing Cui, Linzhi Huang, Weihong Deng, Zhaofeng He

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further conduct an extensive evaluation of 6 prevalent detection methods and 15 Large Vision-Language Models (LVLMs) on MMFake Bench under a zero-shot setting. The results indicate that current methods struggle under this challenging and realistic mixed-source MMD setting. Additionally, we propose MMD-Agent, a novel approach to integrate the reasoning, action, and tool-use capabilities of LVLM agents, significantly enhancing accuracy and generalization.
Researcher Affiliation Academia 1Beijing University of Posts and Telecommunications 2University of California, Santa Barbara 3Center for Research on Intelligent Perception and Computing, NLPR, CASIA EMAIL EMAIL
Pseudocode No The paper describes the MMD-Agent framework and its stages (textual veracity check, visual veracity check, and cross-modal consistency reason) and illustrates them with a diagram in Figure 4, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes We open-source datasets and detection codes but do not release data generation codes for safety.
Open Datasets Yes We introduce MMFake Bench, the first comprehensive benchmark for evaluating mixed-source MMD. ... (Yue et al., 2024). ... FEVER (Thorne et al., 2018) ... Politifact (Shu et al., 2020) ... Gossipcop (Shu et al., 2020) ... Snopes (Hanselowski et al., 2019) ... MOCHEG (Yao et al., 2023) ... LLMFake (Chen & Shu, 2024) ... EMU (Da et al., 2021) ... Fakeddit (Nakamura et al., 2020) ... MAIM (Jaiswal et al., 2017) ... MEIR (Sabir et al., 2018) ... News CLIPpings (Luo et al., 2021) ... COSMOS (Aneja et al., 2023) ... DGM4 (Shao et al., 2023) ... MS-COCO (Lin et al., 2014) and Visual News datasets (Liu et al., 2021). ... COCO-Counterfactuals (Le et al., 2023). ... All datasets provided in this work are licensed under the Attribution Non-Commercial Share Alike 4.0 International (CC BY-NC-SA 4.0) license.
Dataset Splits Yes MMFake Bench consists of 11,000 image-text pairs, which are divided into a validation set and a test set following (Yue et al., 2024). The validation set, comprising 1,000 image-text pairs, is intended for hyperparameter selection, while the test set contains 10,000 pairs.
Hardware Specification Yes All experiments are performed on eight NVIDIA Ge Force 3090 GPUs with Py Torch.
Software Dependencies Yes All experiments are performed on eight NVIDIA Ge Force 3090 GPUs with Py Torch. ... As for Chat GPT model, we use GPT-3.5 (gpt-3.5-turbo) or GPT-4 (gpt-4-vision-preview) as generators or detectors. As for text-to-image models, we use DALLE (DALLE-E3), Stable-Diffusion (Stable Diffusion XL), and Midjourney (Midjourney V6).
Experiment Setup Yes To achieve the justified evaluation, we have set the sampling hyperparameter of the off-the-shelf LVLMs, do_sample = False or Temperature = 0 , to guarantee consistency in the predicted outputs. We adopt the default setting of other hyperparameters such as max_new_tokens = 512 .