reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis

Authors: Zhixin Han, Mengting Hu, Yinhao Bai, Xunzhi Wang, Bitong Luo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two widely-used datasets demonstrate that our method achieves state-of-the-art performance. Furthermore, our framework substantially outperforms GPT-4o and other multimodal large language models, showcasing its superior effectiveness in multimodal sentiment analysis. [...] Ablation Study
Researcher Affiliation	Collaboration	1College of Software, Nankai University 2JD AI Research, Beijing, China
Pseudocode	No	The paper describes the methodology with equations and a framework overview figure (Figure 2) but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps formatted like code.
Open Source Code	Yes	Our code and data2 provide implementation details, including pre-trained models, training details, and training durations across two datasets. 2https://github.com/Zhixin Han/DEQA
Open Datasets	Yes	Following previous studies, we use two widely adopted benchmarks: Twitter2015 and Twitter20171 to evaluate DEQA. 1The basic statistics of these two datasets are provided in Yu and Jiang (2019) s paper.
Dataset Splits	Yes	Following previous studies, we use two widely adopted benchmarks: Twitter2015 and Twitter20171 to evaluate DEQA. 1The basic statistics of these two datasets are provided in Yu and Jiang (2019) s paper.
Hardware Specification	Yes	We train our model using an NVIDIA RTX A6000 GPU and implement an early stopping strategy (Prechelt 1998) with a patience of 3 epochs and a threshold of 0.01 to prevent overfitting.
Software Dependencies	No	The paper mentions several models and tools like 'gpt-4-vision-preview', 'Adam W optimizer', 'De BERTa', 'CLIP', 'deberta-v3-large', 'clip-vit-large-patch14-336', and 'gpt-4o-2024-05-13'. However, it does not provide specific version numbers for ancillary software libraries or programming languages used (e.g., Python, PyTorch versions).
Experiment Setup	Yes	We train our model using an NVIDIA RTX A6000 GPU and implement an early stopping strategy (Prechelt 1998) with a patience of 3 epochs and a threshold of 0.01 to prevent overfitting. The Adam W optimizer (Loshchilov and Hutter 2019) is utilized for training, with a weight decay (Krogh and Hertz 1991) of 0.01. Additionally, we employ a linear learning rate scheduler with a warmup ratio (He et al. 2015) of 0.1 to adjust the learning rate throughout the training process. In the MASC sub-model, we first train the text-only sentiment classification expert. Subsequently, the fine-tuned weights of this expert are used to initialize the text and description sentiment classification expert. For the two text and vision experts, the factor dimension of MFB is set to 1 for attention scoring and 8 for fusion. During the training process, we freeze the CLIP image encoder.