SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Authors: Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on sequential and permuted sequential MNIST classification tasks, as well as the Long Range Arena (LRA) benchmark, where our model achieves competitive performance with state-of-the-art SSMs meanwhile with high sparsity. Additionally, in large-scale language modeling task on the Wiki Text-103 dataset. Our model sets a new record in the field of SNN, demonstrating its scalability. |
| Researcher Affiliation | Collaboration | 1 Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 2 ACSLab, Huawei Technologies Co., Ltd., Shenzhen 3 School of Mathematical Sciences, Peking University, Beijing 4 Department of Computer Science, City University of Hong Kong, Hong Kong 5 Pengcheng Laboratory, Shenzhen |
| Pseudocode | No | The paper describes methods in narrative and mathematical form (e.g., equations 1-12) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code https://github.com/shenshuaijie/SDN |
| Open Datasets | Yes | On the long range arena benchmark task, Spiking SSM achieves competitive performance... On language modeling, our network significantly surpasses existing spiking large language models (spiking LLMs) on the Wiki Text103 dataset... Sequential MNIST The MNIST dataset (Yann and Cortes 1998) comprises 70,000 grayscale images... The sequential MNIST (s MNIST) dataset (Le, Jaitly, and Hinton 2015) is created by flattening the original 2-dimensional images... LRA The LRA benchmark (Tay et al. 2021b) is proposed for the purpose of benchmarking sequence models under the long-context scenario. ... Wiki Text-103 The Wiki Text-103 dataset is a comprehensive collection of text from Wikipedia articles... |
| Dataset Splits | Yes | The MNIST dataset (Yann and Cortes 1998) comprises 70,000 grayscale images of handwritten digits (0-9), divided as 60,000 training and 10,000 testing images each with a size of 28 28 pixels. |
| Hardware Specification | No | The paper mentions 'a single GPU' for time measurements but does not specify any particular model (e.g., NVIDIA A100, RTX 2080 Ti) or other hardware specifications like CPU, memory, or specific computing environment details. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9', 'CUDA 11.1'). |
| Experiment Setup | Yes | The number of training and testing samples are 10^5 and 10^4, respectively. ... L = 1024 is the sequence length. ... The SDN is a 4-layer CNN constructed by 1-D convolutions... we train SDN on the generated dataset with mean square error (MSE) as the loss function for 100 epochs. ... The inputs are 1-D sequences with varying lengths of L = 1K, 2K, 4K, 8K with batch size of 64. |