reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ALLVB: All-in-One Long Video Understanding Benchmark

Authors: Xichen Tan, Yuanjing Luo, Yunfan Ye, Fang Liu, Zhiping Cai

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have tested various mainstream MLLMs on ALLVB, and the results indicate that even the most advanced commercial models have significant room for improvement. This reflects the benchmark s challenging nature and demonstrates the substantial potential for development in long video understanding.
Researcher Affiliation	Academia	1College of Computer Science and Technology, National University of Defense Technology, Changsha, China 2School of Design, Hunan University, Changsha, China EMAIL
Pseudocode	No	The paper describes its methodology through a 'construction pipeline' diagram (Figure 1) and descriptive text, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper provides a link to its dataset: 'Datasets https://huggingface.co/datasets/ALLVB/ALLVB'. However, it does not provide any concrete access to source code for the methodology described in the paper.
Open Datasets	Yes	Datasets https://huggingface.co/datasets/ALLVB/ALLVB
Dataset Splits	Yes	First, we divide ALLVB into a training set and a test set at a 9:1 ratio, containing 1,236 and 140 videos, respectively.
Hardware Specification	Yes	Open-source models are run locally on an NVIDIA 4090, while closed-source models are accessed via official API.
Software Dependencies	No	The paper mentions several MLLMs (e.g., Otter-I, LLaVA-1.6, GPT-4o, Claude 3.5 Sonnet) and LLM parameters (7B), but it does not specify software versions for programming languages, libraries, or other dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	LLM parameters are uniformly set to 7B and all tests are conducted in a 0-shot format. Among them... all models receive 16 frames uniformly sampled across the entire video, along with the corresponding subtitles for these frames. LLM parameters are uniformly set to 7B and all tests are conducted in a 0-shot format. Open-source models are run locally... while closed-source models are accessed via official API. Open-source models answer each Q&A in the video individually, while closed-source models answer all Q&As in a single video at once. Open-source and closed-source models each use the same prompts to ensure fairness. The specific prompt details are as follows:...