Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection

Authors: Hongsong Wang, Andi Xu, Pinle Ding, Jie Gui

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four datasets demonstrate that our method dramatically outperforms state-of-the-art methods and exhibits superior generalization performance. Experiments conducted on popular human-related anomaly detection datasets demonstrate the superior performance of our proposed method compared to state-of-the-art approaches. Our approach consistently beats the other state-of-the-art methods on the four benchmarks. In order to more comprehensively demonstrate the efficacy of our proposed framework, we undertake ablation studies, examining factors such as DCT, United Association Discrepancy (UAD), Conditioned Embedding (CE), and Mask Completion (MC).
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China 3School of Cyber Science and Engineering, Southeast University, Nanjing 210096, China 4 Engineering Research Center of Blockchain Application, Supervision And Management (Southeast University), Ministry of Education, China 5 Purple Mountain Laboratories, Nanjing 210000, China EMAIL
Pseudocode Yes Algorithm 1: Training procedure of the proposed framework. Algorithm 2: Inference procedure of the proposed framework.
Open Source Code Yes Code https://github.com/guijiejie/DCMD-main
Open Datasets Yes We conduct experiments on four popular benchmarks: Human-related Shanghai Tech Campus (HR-STC), Human-related CUHK Avenue (HR-Avenue), HR-UBnormal, and UBnormal. Our approach outperforms the recent diffusionbased method Mo Co DAD (Flaborea et al. 2023) by 1.0%, 1.0%, 0.6%, and 0.7% in AUC scores on the HR-STC, HR-Avenue, HR-UBnormal, and UBnormal datasets, respectively.
Dataset Splits No The paper does not provide specific training/test/validation splits (e.g., percentages or counts) for the overall datasets (HR-STC, HR-Avenue, HR-UBnormal, UBnormal). It only mentions how motion sequences within a window are split into history and future frames: 'For extracting motion sequences, we employ a window size of 7 frames, where the first 3 frames comprise the historical motion sequences, and the subsequent 4 frames represent the future motion sequences.'
Hardware Specification Yes The experiments are conducted on an NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify version numbers for any programming languages, libraries, or other software components used in the implementation.
Experiment Setup Yes We train the network end-to-end using the Adam optimizer with a learning rate of 1e 4 that is decayed every 36 epochs. The diffusion process employs cosine variance scheduling with β1 = 1e 4, βT = 2e 2, and T = 10. We set λ = 0.01. The hidden sizes of the encoder for the reconstruction branch are (512, 256), and the dimension of the hidden embedding is 256. The noise prediction network consisted of 6 layers of motion transformer blocks, where the number of heads is 8, and the hidden dimension is 512. The batch size is set to 4096 for HR-STC and 1024 for HR-Avenue.