Hybrid Spiking Vision Transformer for Object Detection with Event Cameras
Authors: Qi Xu, Jie Deng, Jiangrong Shen, Biwu Chen, Huajin Tang, Gang Pan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Hs VT outperforms existing SNN methods and achieves competitive performance compared to ANN-based models, with fewer parameters and lower energy consumption. ... We evaluate the proposed Hs VT model through a series of experiments. First, we describe the datasets and experimental setup. Second, we perform ablation studies to validate the effectiveness of the proposed components. Third, we compare the performance of Hs VT with other methods on multiple datasets. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Technology, Dalian University of Technology, Dalian, China 2Faculty of Electronic and Information Engineering, Xi an Jiaotong University, Xian, China 3National Key Lab of Human-Machine Hybrid Augmented Intelligence, Xi an Jiaotong University, Xian, China 4State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 5Shanghai Radio Equipment Research Institute, Shanghai, China 6College of Computer Science and Technology, Zhejiang University, Hangzhou, China. Correspondence to: Jiangrong Shen <EMAIL>, Gang Pan <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using text and architectural diagrams (e.g., Figure 2, Figure 3, Figure 5), but no distinct pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper states that a dataset is publicly available: "The dataset is publicly available at: our Dropbox repository". However, there is no explicit statement or link provided for the open-source code for the methodology (Hs VT model) itself. |
| Open Datasets | Yes | To support research in this area, we developed the Fall Detection dataset as a benchmark for event-based object detection tasks. ... The dataset is publicly available at: our Dropbox repository. Several public datasets have been developed for event-based object detection. The Prophesees GEN1 Automotive Detection Dataset (De Tournemire et al., 2020), collected using the Prophesee GEN1 sensor (304240), provides 39 hours of driving data with annotations for pedestrians and cars. |
| Dataset Splits | No | The paper discusses the datasets used (GEN1, FALL Detection, Aircraft Detection) and various time intervals for event accumulation, but it does not specify any training, validation, or test dataset splits or percentages required for reproducibility. |
| Hardware Specification | Yes | We train Hs VT on two NVIDIA Ge Force RTX 4090 GPUs, using batch size of 8 for the tiny model, and batch sizes of 4 for the basic and small model. |
| Software Dependencies | No | The paper mentions using the ADAM optimizer and the One Cycle learning rate scheduling strategy, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used for implementation. |
| Experiment Setup | Yes | We adopted the popular ADAM optimizer (Kingma & Ba, 2014) and utilized the One Cycle learning rate scheduling strategy (Smith & Topin, 2019). This strategy starts from the maximum learning rate and linearly decays during training, effectively accelerating the training speed of neural networks. Mixed Precision Training (Micikevicius et al., 2017). To speed up training time and reduce memory usage without sacrificing model accuracy, we utilize mixed precision training techniques. ... using batch size of 8 for the tiny model, and batch sizes of 4 for the basic and small model. |