ALLVB: All-in-One Long Video Understanding Benchmark
Authors: Xichen Tan, Yuanjing Luo, Yunfan Ye, Fang Liu, Zhiping Cai
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have tested various mainstream MLLMs on ALLVB, and the results indicate that even the most advanced commercial models have significant room for improvement. This reflects the benchmark s challenging nature and demonstrates the substantial potential for development in long video understanding. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, National University of Defense Technology, Changsha, China 2School of Design, Hunan University, Changsha, China EMAIL |
| Pseudocode | No | The paper describes its methodology through a 'construction pipeline' diagram (Figure 1) and descriptive text, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to its dataset: 'Datasets https://huggingface.co/datasets/ALLVB/ALLVB'. However, it does not provide any concrete access to source code for the methodology described in the paper. |
| Open Datasets | Yes | Datasets https://huggingface.co/datasets/ALLVB/ALLVB |
| Dataset Splits | Yes | First, we divide ALLVB into a training set and a test set at a 9:1 ratio, containing 1,236 and 140 videos, respectively. |
| Hardware Specification | Yes | Open-source models are run locally on an NVIDIA 4090, while closed-source models are accessed via official API. |
| Software Dependencies | No | The paper mentions several MLLMs (e.g., Otter-I, LLaVA-1.6, GPT-4o, Claude 3.5 Sonnet) and LLM parameters (7B), but it does not specify software versions for programming languages, libraries, or other dependencies like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | LLM parameters are uniformly set to 7B and all tests are conducted in a 0-shot format. Among them... all models receive 16 frames uniformly sampled across the entire video, along with the corresponding subtitles for these frames. LLM parameters are uniformly set to 7B and all tests are conducted in a 0-shot format. Open-source models are run locally... while closed-source models are accessed via official API. Open-source models answer each Q&A in the video individually, while closed-source models answer all Q&As in a single video at once. Open-source and closed-source models each use the same prompts to ensure fairness. The specific prompt details are as follows:... |