reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Multimodal Affective Analysis with Learned Live Comment Features

Authors: Zhaoyuan Deng, Amith Ananthram, Kathleen McKeown

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experimentation on a wide range of affective analysis tasks (sentiment, emotion recognition, and sarcasm detection) in both English and Chinese, we demonstrate that these synthetic live comment features significantly improve performance over state-of-the-art methods.
Researcher Affiliation	Academia	Zhaoyuan Deng, Amith Ananthram, Kathleen Mc Keown Columbia University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes its methods through prose and mathematical equations but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code, Dataset, and Appendix https://github.com/dzy49/Affective Live Comm
Open Datasets	Yes	Code, Dataset, and Appendix https://github.com/dzy49/Affective Live Comm
Dataset Splits	Yes	We sample 10% of our data for validation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies	No	The paper mentions software components and pre-trained models like Chinese-RoBERTa-wwm-ext, XLM-RoBERTa-base, Data2Vec-audio-base, HuBERT-base, and TimeSformer-base, but it does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	For pre-training, we use a segment length σ of 8 seconds, balancing between required context and the length of downstream datasets. We sample 8 frames uniformly from each segment. To reduce noise in the dataset, we employ multiple filtering strategies. First, we exclude comments lacking substantial content, specifically those shorter than 2 characters or those without Chinese characters. Second, we compile a list of low-signal terms and exclude comments containing these words. For user-generated videos, we trim the first and last 15 seconds as they tend to include repetitive comments such as greetings and farewells. For movies, we trim the first and last 5 minutes, and for TV shows, we trim the start and end of each show. Segments containing fewer than 5 live comments are excluded from pre-training to allow efficient GPU batching. For each epoch, we randomly select 5 live comments per segment so that a batch with N samples has K = 5N comments. We sample 10% of our data for validation.