reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

Authors: yuntao du, Kailin Jiang, Zhi Gao, Chenrui Shi, Zilong Zheng, Siyuan Qi, Qing Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess five state-of-the-art knowledge editing methods on three prominent LMMs, revealing that no method excels across all criteria, and that visual and user-specific edits are particularly challenging. MMKEBench sets a new standard for evaluating the robustness of multimodal knowledge editing techniques, driving progress in this rapidly evolving field. Also, 'Extensive experiments with various baseline methods and LMMs in both single and sequential editing settings are conducted, revealing several limitations in existing knowledge editing approaches.'
Researcher Affiliation	Academia	1State Key Laboratory of General Artificial Intelligence, BIGAI 2School of Software & Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University 3University of Science and Technology of China 4State Key Laboratory of General Artificial Intelligence, Peking University 5Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology
Pseudocode	No	The paper describes the methods and processes using natural language descriptions and diagrams like Figure 2 for the construction pipeline, but does not present any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the 'VLKEB library' (https://github.com/VLKEB/VLKEB) for conducting experiments, but does not explicitly state that the source code for the MMKE-Bench construction methodology described in this paper is being released or provide a link to such code.
Open Datasets	No	The paper introduces MMKE-Bench, a comprehensive multimodal knowledge editing benchmark consisting of 2,940 pieces of knowledge and 8,363 images. While the paper describes the creation and statistics of this new benchmark, it does not provide a direct URL, DOI, or specific repository name for accessing the dataset.
Dataset Splits	Yes	The statistics of MMKE-Bench are shown in Tab.2. MMKE-Bench encompasses three classes of edited knowledge, totaling 2,940 knowledge pieces and 8,363 images. The knowledge spans 175 finegrained types, highlighting the diversity of MMKEBench. We split the dataset into training and validation sets at 4:6, with the training set reserved solely for specific knowledge editing methods (e.g., SERAC Mitchell et al. (2022b)).
Hardware Specification	Yes	The experiments are performed on NVIDIA A100/A800 80GB GPUs.
Software Dependencies	No	The paper mentions using 'Py Torch' and the 'VLKEB library' for experiments, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Appendix B includes Tables 10, 11, and 12, which provide detailed hyper-parameters for knowledge editing methods and LMMs on visual entity editing, visual semantic editing, and user-specific editing. These tables specify settings such as Steps, Edit Layer, Optimizer, and Edit LR for models like BLIP2-OPT, Mini GPT-4, and LLa VA-1.5, and methods like FT-LLM, FT-Alignment, KE, SERAC, and MEND.