Local Patterns Generalize Better for Novel Anomalies
Authors: Yalong Jiang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on popular benchmark datasets demonstrate the achievement of state-of-the-art performance. Code is available at https://github.com/Allen YLJiang/ Local-Patterns-Generalize-Better/. 4 EXPERIMENTS AND RESULTS This section compares the proposed method with state-of-the-art ones and presents ablation studies. |
| Researcher Affiliation | Academia | Yalong Jiang Beihang University Allen EMAIL |
| Pseudocode | Yes | Algorithm 1 Two-Stage Process for Identifying Spatial Local Patterns... Algorithm 2 Algorithm for Converting Declarative Sentences to Interrogative Sentences |
| Open Source Code | Yes | Code is available at https://github.com/Allen YLJiang/ Local-Patterns-Generalize-Better/. |
| Open Datasets | Yes | Experiments are conducted on seven datasets. The training sets of Shanghai Tech, Avenue and UCSD Ped2 contain only normal events and anomalies reside in test data. (1) Shanghai Tech dataset Liu et al. (2018)... (2) CUHK Avenue dataset Lu et al. (2013)... (3) Ubnormal dataset Acsintoae et al. (2022)... (4) NWPU Campus dataset Cao et al. (2023)... (5) UCSD Ped2 dataset Li et al. (2014)... (6) UCF Crime dataset Sultani et al. (2018)... (7) XD Violence Wu et al. (2020)... SMM, with NSMM = 3 state machines, is trained on the COCO-Caption dataset Lin et al. (2014). |
| Dataset Splits | Yes | (1) Shanghai Tech dataset Liu et al. (2018) includes 330 training videos and 107 test videos. (2) CUHK Avenue dataset Lu et al. (2013) involves 16 training videos and 21 test videos. (3) Ubnormal dataset Acsintoae et al. (2022) is divided into a training set with 268 videos, a validation set with 64 videos, and a test set with 211 videos. (4) NWPU Campus dataset Cao et al. (2023) comprises 43 scenes, 28 classes of anomalies and 16 hours of video footage. (5) UCSD Ped2 dataset Li et al. (2014) contains 16 normal training videos and 12 test videos. (6) UCF Crime dataset Sultani et al. (2018) includes 1610 training videos in which 800 contain only normal behaviors. The test set includes 290 videos in which 140 include anomalies. (7) XD Violence Wu et al. (2020) includes 4754 videos where 2349 are non-violent and 2405 are violent. There are 3954 training videos and 800 test videos where 500 are violent. |
| Hardware Specification | Yes | Implementations are based on Pytorch Pytorch (2018) and a NVIDIA A100 GPU... All experiments are conducted on an NVIDIA A100 GPU and an Intel(R) Xeon(R) Gold 6248R CPU. |
| Software Dependencies | No | Implementations are based on Pytorch Pytorch (2018) and a NVIDIA A100 GPU. RM is trained on benchmark videos without anomalies. The influences of RM s number of layers will be shown in Appendix F. The evaluations on operational efficiency will be detailed in Appendix H. ... To enhance the spatial local patterns obtained from Stage 2, this paper proposes to encode frames into H.265 (HEVC) videos using FFmpeg Zeng et al. (2016). ... Nvocabulary is vocabulary size, according to BERT tokenizer Devlin et al. (2018). ... The conversion is based on nltk library Hardeniya et al. (2016) |
| Experiment Setup | Yes | To capture more contexts, bounding boxes are expanded by 50% on both sides horizontally and vertically... For image region i at t, the output of backbone and ITAM are HI i (t) RSd Vd and FI i (t) RNq Hd which satisfy Sd = 257, Vd = 1408, Nq = 32, Hd = 768... RM has Dh = 512 in intermediate layers. SMM, with NSMM = 3 state machines, is trained on the COCO-Caption dataset Lin et al. (2014)... All weights are initialized with distribution N(0, 0.02). Training spans 20 epoches with initial learning rate 5 10 5 and decay 0.99. RM takes concatenated GI i (t) and MI i (t) as input, with Re LU activations. It is trained using Adam optimizer with learning rate 10 3 for 10 epoches, using MSE loss. |