reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing

Authors: Pengcheng Zhao, Jinxing Zhou, Yang Zhao, Dan Guo, Yanxiang Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness of the proposed modules and loss functions, resulting in a new state-of-the-art parsing performance. 4 Experiments 4.1 Experimental Setups Dataset & Metrics. Following prior works (Jiang et al. 2022; Lai, Chen, and Yu-Chiang 2023), our experiments are conducted on the Look, Listen, and Parse (LLP) (Tian, Li, and Xu 2020) dataset, which is currently the sole standard dataset used for the AVVP task.
Researcher Affiliation	Academia	School of Computer Science and Information Engineering, Hefei University of Technology EMAIL, EMAIL
Pseudocode	No	The paper describes the approach using textual descriptions and diagrams (Figure 2) but does not provide any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code will be publicly available in https://github.com/Pengcheng Zhao1001/MM-CSE.
Open Datasets	Yes	Following prior works (Jiang et al. 2022; Lai, Chen, and Yu-Chiang 2023), our experiments are conducted on the Look, Listen, and Parse (LLP) (Tian, Li, and Xu 2020) dataset, which is currently the sole standard dataset used for the AVVP task.
Dataset Splits	Yes	Following the official data splits, the dataset is divided into 10,000 videos for training, 649 for validation, and 1,200 for testing.
Hardware Specification	No	The paper describes training configurations and feature extraction methods but does not specify any particular hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using Adam W optimizer and pretrained models like CLIP, R(2+1)D, and CLAP, but does not specify software dependencies with version numbers like Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	Our model is trained for 60 epochs with a batch size of 64 using Adam W optimizer, with an initial learning rate of 3e-4 and a weight decay of 1e-3. Feature dimensions d1 and d2 are set to 256 and 128, respectively. We use L = 4 stacked FGSE layers. The hyperparameters λ1 and λ2 in Eq. 12 are empirically set to 0.1.