Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion

Authors: Dexuan Ding, Lei Wang, Liyun Zhu, Tom Gedeon, Piotr Koniusz

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our method in video anomaly detection, demonstrating improvements in both performance and interpretability over traditional feature-level fusion techniques. 4 EXPERIMENTS. Datasets. Metrics. Baselines. Evaluation. Figure 4: Hyperparameter evaluations for (a) cut-off threshold α, (b) top k maximum degrees, and (c) λ in the regularization term across all four video anomaly detection datasets, using I3D visual features and text features in our EGO fusion framework. Table 1: Experimental results on feature-level and graph-level fusion across four video anomaly detection datasets... Table 2: Comparison of MTN fusion (feature-level) and EGO fusion (graph-level).
Researcher Affiliation Academia 1Australian National University, 2Data61/CSIRO, 3Curtin University. Corresponding author (EMAIL).
Pseudocode No The paper describes methods using mathematical formulations (e.g., equations 1-7) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about the release of source code or links to a code repository for the methodology described.
Open Datasets Yes We select the following datasets for our evaluation: (i) UCSD Ped2 (Ped2)... (ii) Shanghai Tech (Sh T)... (iii) CUHK Avenue (Avenue)... (iv) Street Scene (Street)... We use popular models pretrained on Kinetics-400 (Kay et al., 2017)... pretrained on VATEX (Wang et al., 2019b)... XD-Violence... UCF-Crime... Multi-Scenario Anomaly Detection (MSAD): Introduced by Zhu et al. (2024)...
Dataset Splits Yes UCSD Ped2 features 16 training and 21 testing videos... Shanghai Tech has 330 training and 107 testing videos... CUHK Avenue includes 16 training and 21 testing videos... Street Scene comprises 46 training and 35 testing videos... XD-Violence: This dataset contains 3,954 training videos and 800 testing videos... UCF-Crime: This dataset includes 1,610 training videos and 290 testing videos... Multi-Scenario Anomaly Detection (MSAD): ...360 training videos and 360 testing videos (Protocol ii)...
Hardware Specification Yes Training times for one epoch (in seconds) with a batch size of 32 on an Nvidia RTX 4070 GPU are also reported... All experiments are conducted on a single Nvidia V100 GPU with a batch size of 32.
Software Dependencies No The paper mentions various models and frameworks such as I3D, C3D, Swin T, Swin BERT, Sim CSE, Open Pose, ST-GCN, PyTorch, and TensorFlow, but does not provide specific version numbers for any of these software components.
Experiment Setup Yes We set training epochs to 30-50, depending on the datasets... For simplicity, we set N to 32. We use cosine similarity to create relationship graphs in our experiments. Table 3: Optimal hyperparameters for each dataset. Dataset Operator m n λ α k Best AUC. We set both P (for I3D visual features) and Q (for Sim CSE text embeddings) in equation 5 to range from 1 to 10...