Adaptive Multimodal Fusion: Dynamic Attention Allocation for Intent Recognition
Authors: Bo Hu, Kai Zhang, Yanghai Zhang, Yuyang Ye
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach significantly outperforms existing state-of-the-art methods. ... We validated our approach on two challenging datasets, demonstrating that it surpasses state-of-the-art methods in the multimodal intent recognition task. |
| Researcher Affiliation | Academia | RWTH Aachen University, State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Department of Management Science and Information Systems, Rutgers University |
| Pseudocode | No | The paper describes methods in prose and uses diagrams (e.g., Figure 2) to illustrate architecture, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Freyrlake/MVCL-DAF |
| Open Datasets | Yes | We validate our proposed framework on two challenging datasets dedicated to intent recognition. MInt Rec MInt Rec (Zhang et al. 2022a) dataset contains 2,224 high-quality samples and 20 intent labels. MInt Rec2.0 MInt Rec2.0 (Zhang et al. 2024) represents a large-scale benchmark dataset designed for multimodal intent recognition. This dataset encompasses 1,245 high-quality dialogues, aggregating a total of 15,040 samples that incorporate text, video, and audio modalities. |
| Dataset Splits | Yes | We designed two experimental scenarios to test generalization. In the first scenario, MInt Rec was used as the training and validation dataset, with MInt Rec 2.0 serving as the test dataset. In the second scenario, the roles of the two datasets were swapped. In each scenario, we trained and evaluated our proposed method and all baseline models using random seeds 0 to 4. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using specific models like BERT and Transformer encoders, but does not provide version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | No | The paper describes the methodology and experimental results but does not explicitly state hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations in the main text. |