Oscillatory State-Space Models
Authors: T. Konstantin Rusch, Daniela Rus
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our empirical results, spanning a wide range of time-series tasks from mid-range to very long-range classification and regression, as well as long-horizon forecasting, demonstrate that our proposed Lin OSS model consistently outperforms state-of-the-art sequence models. Notably, Lin OSS outperforms Mamba and LRU by nearly 2x on a sequence modeling task with sequences of length 50k. Code: https://github.com/tk-rusch/linoss. |
| Researcher Affiliation | Academia | T. Konstantin Rusch MIT EMAIL Daniela Rus MIT |
| Pseudocode | Yes | Algorithm 1 Full Lin OSS model Input: Input sequence u Output: L-block Lin OSS output sequence o u0 Wencu + benc Encode input sequence for l = 1, . . . , L do yl solution of ODE in (1) with input ul 1 via parallel scan xl Cyl + Dul 1 Linear readout in (1) xl GELU(xl) ul GLU(xl) + ul 1 end for o Wdecy L + bdec Decode final Lin OSS block output |
| Open Source Code | Yes | Code: https://github.com/tk-rusch/linoss. |
| Open Datasets | Yes | In the first part of the experiments, we focus on a recently proposed long-range sequential benchmark introduced in Walker et al. (2024). This benchmark focuses on six datasets from the University of East Anglia (UEA) Multivariate Time Series Classification Archive (UEA-MTSCA) (Bagnall et al., 2018), selecting those with the longest sequences for increased difficulty. The sequence lengths range thereby from 400 to almost 18k. To this end, we consider the PPG-Da Li A dataset, a multivariate time series regression dataset designed for heart rate prediction using data collected from a wrist-worn device (Reiss et al., 2019). We consider a weather prediction task introduced in Zhou et al. (2021). |
| Dataset Splits | Yes | More concretely, we use the same pre-selected random seeds for splitting the datasets into training, validation, and testing parts (using 70/15/15 splits), as well as tune our model hyperparameters only on the same pre-described grid. We follow Walker et al. (2024) and divide the data into training, validation, and test sets with a 70/15/15 split for each individual. Thus, we simply follow Gu et al. (2021) by setting up Lin OSS as a general sequence-to-sequence model that treats forecasting as a masked sequence-to-sequence transformation. We consider a weather prediction task introduced in Zhou et al. (2021). In this task, several climate variables are predicted into the future based on local climatological data. Here, we focus on the hardest task in Zhou et al. (2021) of predicting the future 720 timesteps (hours) based on the past 720 timesteps. |
| Hardware Specification | Yes | All experiments were conducted on Nvidia Tesla V100 GPUs and Nvidia RTX 4090 GPUs, with the exception of the PPG experiment, which was run on Nvidia Tesla A100 GPUs due to higher memory demands. Note that we used exactly the same GPU architecture as well as the same code and python libraries as in Walker et al. (2024) to ensure fair comparability, i.e., GPU memory usage and run time was measured on an Nvidia RTX 4090 GPU for all models. |
| Software Dependencies | No | The code to run the experiments is implemented using the JAX auto-differentiation framework (Bradbury et al., 2018). All experiments were conducted on Nvidia Tesla V100 GPUs and Nvidia RTX 4090 GPUs, with the exception of the PPG experiment, which was run on Nvidia Tesla A100 GPUs due to higher memory demands. |
| Experiment Setup | Yes | The hyperparameters of the models were optimized with the same grid search approach from Walker et al. (2024) for the six datasets in Section 4.1 and the PPG dataset of Section 4.2 to ensure perfect comparability with competing methods, i.e., using the grid: learning rate = {0.00001, 0.0001, 0.001}, number of layers = {2, 4, 6}, number of hidden neurons = {16, 64, 128}, state-space dimension = {16, 64, 256}, include time dimension {True, False}. Note that for the weather dataset, we performed a random search instead of grid search using the same hyperparameter bounds as before, except that we increased the maximum number of Lin OSS blocks from 6 to 8. |