M^3EL: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking

Authors: Fang Wang, Shenglin Yin, Xiaoying Bai, Minghao Hu, Tianwei Yan, Yi Liang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the existing models perform far below expectations (ACC of 49.4%-75.8%), After analysis, it was obtained that small dataset sizes, insufficient modality task coverage, and limited topic diversity resulted in poor generalization of multi-modal models. Our dataset effectively addresses these issues, and the CLIPND model fine-tuned with M 3EL shows a significant improvement in accuracy, with an average improvement of 9.3% to 25% across various tasks. [...] We conducted an ablation study, and the experimental results are presented in Figure 3.
Researcher Affiliation Academia 1Peking University 2Advanced Institute of Big Data,Beijing 3National University of Defense Technology 4Xinjiang University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the data construction pipeline in Figure 1 but does not present it as structured pseudocode or an algorithm block.
Open Source Code No The footnote 1 links 'https://github.com/ww-ffff/M3EL' next to 'M 3EL1' when describing the data construction. This link pertains to the dataset, not explicitly the open-source code for the methodology, such as the CLIPND model or the augmented training strategy.
Open Datasets Yes To address these obstacles, we propose a dataset construction pipeline and publish M 3EL, a large-scale dataset for MEL. M 3EL includes 79,625 instances, covering 9 diverse multi-modal tasks, and 5 different topics. [...] Our dataset publicly available to facilitate future research. [...] 1https://github.com/ww-ffff/M3EL
Dataset Splits Yes The M 3EL dataset has been divided into training, validation, and test sets in the ratio of 8:1:1.
Hardware Specification Yes We trained on an A100 GPU with 40GB of memory with 35 epochs and a batch of 256.
Software Dependencies No The paper mentions employing the CLIP (Vi T-B-32) architecture and the AdamW optimizer but does not specify software versions for libraries like PyTorch or TensorFlow.
Experiment Setup Yes We trained on an A100 GPU with 40GB of memory with 35 epochs and a batch of 256. The Adam W optimizer was applied with an initial learning rate of 1e-5, betas=(0.9, 0.98), eps=1e-6, and weight decay=0.001.