DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis

Authors: Zhixin Han, Mengting Hu, Yinhao Bai, Xunzhi Wang, Bitong Luo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two widely-used datasets demonstrate that our method achieves state-of-the-art performance. Furthermore, our framework substantially outperforms GPT-4o and other multimodal large language models, showcasing its superior effectiveness in multimodal sentiment analysis. [...] Ablation Study
Researcher Affiliation Collaboration 1College of Software, Nankai University 2JD AI Research, Beijing, China
Pseudocode No The paper describes the methodology with equations and a framework overview figure (Figure 2) but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps formatted like code.
Open Source Code Yes Our code and data2 provide implementation details, including pre-trained models, training details, and training durations across two datasets. 2https://github.com/Zhixin Han/DEQA
Open Datasets Yes Following previous studies, we use two widely adopted benchmarks: Twitter2015 and Twitter20171 to evaluate DEQA. 1The basic statistics of these two datasets are provided in Yu and Jiang (2019) s paper.
Dataset Splits Yes Following previous studies, we use two widely adopted benchmarks: Twitter2015 and Twitter20171 to evaluate DEQA. 1The basic statistics of these two datasets are provided in Yu and Jiang (2019) s paper.
Hardware Specification Yes We train our model using an NVIDIA RTX A6000 GPU and implement an early stopping strategy (Prechelt 1998) with a patience of 3 epochs and a threshold of 0.01 to prevent overfitting.
Software Dependencies No The paper mentions several models and tools like 'gpt-4-vision-preview', 'Adam W optimizer', 'De BERTa', 'CLIP', 'deberta-v3-large', 'clip-vit-large-patch14-336', and 'gpt-4o-2024-05-13'. However, it does not provide specific version numbers for ancillary software libraries or programming languages used (e.g., Python, PyTorch versions).
Experiment Setup Yes We train our model using an NVIDIA RTX A6000 GPU and implement an early stopping strategy (Prechelt 1998) with a patience of 3 epochs and a threshold of 0.01 to prevent overfitting. The Adam W optimizer (Loshchilov and Hutter 2019) is utilized for training, with a weight decay (Krogh and Hertz 1991) of 0.01. Additionally, we employ a linear learning rate scheduler with a warmup ratio (He et al. 2015) of 0.1 to adjust the learning rate throughout the training process. In the MASC sub-model, we first train the text-only sentiment classification expert. Subsequently, the fine-tuned weights of this expert are used to initialize the text and description sentiment classification expert. For the two text and vision experts, the factor dimension of MFB is set to 1 for attention scoring and 8 for fusion. During the training process, we freeze the CLIP image encoder.