Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

Authors: Qi Xu, Jie Deng, Jiangrong Shen, Biwu Chen, Huajin Tang, Gang Pan

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Hs VT outperforms existing SNN methods and achieves competitive performance compared to ANN-based models, with fewer parameters and lower energy consumption. ... We evaluate the proposed Hs VT model through a series of experiments. First, we describe the datasets and experimental setup. Second, we perform ablation studies to validate the effectiveness of the proposed components. Third, we compare the performance of Hs VT with other methods on multiple datasets.
Researcher Affiliation Collaboration 1School of Computer Science and Technology, Dalian University of Technology, Dalian, China 2Faculty of Electronic and Information Engineering, Xi an Jiaotong University, Xian, China 3National Key Lab of Human-Machine Hybrid Augmented Intelligence, Xi an Jiaotong University, Xian, China 4State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 5Shanghai Radio Equipment Research Institute, Shanghai, China 6College of Computer Science and Technology, Zhejiang University, Hangzhou, China. Correspondence to: Jiangrong Shen <EMAIL>, Gang Pan <EMAIL>.
Pseudocode No The paper describes the methodology using text and architectural diagrams (e.g., Figure 2, Figure 3, Figure 5), but no distinct pseudocode or algorithm blocks are provided.
Open Source Code No The paper states that a dataset is publicly available: "The dataset is publicly available at: our Dropbox repository". However, there is no explicit statement or link provided for the open-source code for the methodology (Hs VT model) itself.
Open Datasets Yes To support research in this area, we developed the Fall Detection dataset as a benchmark for event-based object detection tasks. ... The dataset is publicly available at: our Dropbox repository. Several public datasets have been developed for event-based object detection. The Prophesees GEN1 Automotive Detection Dataset (De Tournemire et al., 2020), collected using the Prophesee GEN1 sensor (304240), provides 39 hours of driving data with annotations for pedestrians and cars.
Dataset Splits No The paper discusses the datasets used (GEN1, FALL Detection, Aircraft Detection) and various time intervals for event accumulation, but it does not specify any training, validation, or test dataset splits or percentages required for reproducibility.
Hardware Specification Yes We train Hs VT on two NVIDIA Ge Force RTX 4090 GPUs, using batch size of 8 for the tiny model, and batch sizes of 4 for the basic and small model.
Software Dependencies No The paper mentions using the ADAM optimizer and the One Cycle learning rate scheduling strategy, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used for implementation.
Experiment Setup Yes We adopted the popular ADAM optimizer (Kingma & Ba, 2014) and utilized the One Cycle learning rate scheduling strategy (Smith & Topin, 2019). This strategy starts from the maximum learning rate and linearly decays during training, effectively accelerating the training speed of neural networks. Mixed Precision Training (Micikevicius et al., 2017). To speed up training time and reduce memory usage without sacrificing model accuracy, we utilize mixed precision training techniques. ... using batch size of 8 for the tiny model, and batch sizes of 4 for the basic and small model.