Adaptive Multimodal Fusion: Dynamic Attention Allocation for Intent Recognition

Authors: Bo Hu, Kai Zhang, Yanghai Zhang, Yuyang Ye

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our approach significantly outperforms existing state-of-the-art methods. ... We validated our approach on two challenging datasets, demonstrating that it surpasses state-of-the-art methods in the multimodal intent recognition task.
Researcher Affiliation Academia RWTH Aachen University, State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Department of Management Science and Information Systems, Rutgers University
Pseudocode No The paper describes methods in prose and uses diagrams (e.g., Figure 2) to illustrate architecture, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Freyrlake/MVCL-DAF
Open Datasets Yes We validate our proposed framework on two challenging datasets dedicated to intent recognition. MInt Rec MInt Rec (Zhang et al. 2022a) dataset contains 2,224 high-quality samples and 20 intent labels. MInt Rec2.0 MInt Rec2.0 (Zhang et al. 2024) represents a large-scale benchmark dataset designed for multimodal intent recognition. This dataset encompasses 1,245 high-quality dialogues, aggregating a total of 15,040 samples that incorporate text, video, and audio modalities.
Dataset Splits Yes We designed two experimental scenarios to test generalization. In the first scenario, MInt Rec was used as the training and validation dataset, with MInt Rec 2.0 serving as the test dataset. In the second scenario, the roles of the two datasets were swapped. In each scenario, we trained and evaluated our proposed method and all baseline models using random seeds 0 to 4.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running its experiments.
Software Dependencies No The paper mentions using specific models like BERT and Transformer encoders, but does not provide version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup No The paper describes the methodology and experimental results but does not explicitly state hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations in the main text.