reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models

Authors: Chao Huang, Yushu Shi, Jie Wen, Wei Wang, Yong Xu, Xiaochun Cao

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on the UCF-Crime and XD-Violence datasets demonstrate that Ex-VAD significantly outperforms existing State-of-The-Art methods.
Researcher Affiliation	Academia	1 Shenzhen Campus of Sun Yat-Sen University, School of Cyber Science and Technology, Shenzhen, China 2 Harbin Institute of Technology, School of Computer Science and Technology, Shenzhen, China. Correspondence to: Xiaochun Cao <EMAIL>.
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (Figure 2 and Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository in the main text, footnotes, or appendices.
Open Datasets	Yes	Datasets. We perform experiments on the UCF-Crime (Sultani et al., 2019) and XD-Violence (Wu et al., 2020) datasets. UCF-Crime consists of 1,900 untrimmed surveillance videos with a total duration of 128 hours, covering 13 real-world anomalies (e.g., abuse, robbery, explosion) and normal activities. XD-Violence contains 4,754 untrimmed videos totaling 217 hours, making it one of the largest multimodal violence detection datasets.
Dataset Splits	Yes	UCF-Crime consists of 1,900 untrimmed surveillance videos... In the WSVAD, 1,610 videos are used for training with video-level annotations, while 290 videos are used for testing with frame-level annotations. XD-Violence contains 4,754 untrimmed videos totaling 217 hours... The dataset is divided into 3,954 training videos and 800 testing videos, with video-level labels.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA RTX A100 GPU using Py Torch.
Software Dependencies	No	The paper mentions "Py Torch", "CLIP (Vi T-B/16)", "BLIP-2", and "Llama-3.1" as software components and models used, but does not specify their version numbers for replication purposes.
Experiment Setup	Yes	Key hyperparameters include: σ = 1, τ = 0.07, context length l = 20, window length in LGT-Adapter (64 for XD-Violence, 8 for UCF-Crime), and λ (1 10 4 for XD-Violence, 1 for UCF-Crime).