reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

WDMIR: Wavelet-Driven Multimodal Intent Recognition

Authors: Weiyin Gong, Kai Zhang, Yanghai Zhang, Qi Liu, Xinjie Sun, Junyu Lu, Linbo Zhu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on MInt Rec demonstrate that our approach achieves state-of-the-art performance, surpassing previous methods by 1.13% on accuracy. Ablation studies further verify that the wavelet-driven fusion module significantly improves the extraction of semantic information from non-verbal sources, with a 0.41% increase in recognition accuracy when analyzing subtle emotional cues. Our experiments significantly improved each metric on MInt Rec and MELD-DA, validating the validity and generalizability of the method.
Researcher Affiliation	Academia	1State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2School of Computer Science, Liupanshui Normal University 3Institute of Artificial Intelligence, Hefei Comprehensive National Science Center EMAIL, EMAIL, EMAIL. All email domains (.edu.cn) and institutional names indicate academic or public research institutions.
Pseudocode	No	The paper describes the methods using mathematical equations and textual explanations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We conduct experiments on two datasets, MInt Rec [Zhang et al., 2022a] and MELD-DA [Saha et al., 2020].
Dataset Splits	Yes	MInt Rec is a multimodal intent dataset containing text, video, and audio, with 2224 samples and 20 intent categories. It includes 1334, 445, and 445 samples for training, validation, and testing, respectively. MELD-DA is a multi-round emotion conversation dataset containing text, video, and audio, with 9988 samples and 12 emotion conversation behavior labels. It includes 6991, 999, and 1998 samples for training, validation, and testing, respectively.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used to run the experiments. It only mentions using pre-trained models and an optimizer.
Software Dependencies	No	The paper mentions specific pre-trained models (bert-base-uncased, wav2vec2-base-960h, Swin-Transformer) and libraries (Torchvision) along with the Adam optimizer, but it does not provide specific version numbers for these software components (e.g., Python, PyTorch/TensorFlow, Torchvision).
Experiment Setup	Yes	Adam [Loshchilov, 2017] as an optimization parameter throughout the experiment. The training batch size is 16, and the validation and test batch sizes are both 8.